首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
MOTIVATION: At present the computational gene identification methods in microbial genomes have a high prediction accuracy of verified translation termination site (3' end), but a much lower accuracy of the translation initiation site (TIS, 5' end). The latter is important to the analysis and the understanding of the putative protein of a gene and the regulatory machinery of the translation. Improving the accuracy of prediction of TIS is one of the remaining open problems. RESULTS: In this paper, we develop a four-component statistical model to describe the TIS of prokaryotic genes. The model incorporates several features with biological meanings, including the correlation between translation termination site and TIS of genes, the sequence content around the start codon; the sequence content of the consensus signal related to ribosomal binding sites (RBSs), and the correlation between TIS and the upstream consensus signal. An entirely non-supervised training system is constructed, which takes as input a set of annotated coding open reading frames (ORFs) by any gene finder, and gives as output a set of organism-specific parameters (without any prior knowledge or empirical constants and formulas). The novel algorithm is tested on a set of reliable datasets of genes from Escherichia coli and Bacillus subtillis. MED-Start may correctly predict 95.4% of the start sites of 195 experimentally confirmed E.coli genes, 96.6% of 58 reliable B.subtillis genes. Moreover, the test results indicate that the algorithm gives higher accuracy for more reliable datasets, and is robust to the variation of gene length. MED-Start may be used as a postprocessor for a gene finder. After processing by our program, the improvement of gene start prediction of gene finder system is remarkable, e.g. the accuracy of TIS predicted by MED 1.0 increases from 61.7 to 91.5% for 854 E.coli verified genes, while that by GLIMMER 2.02 increases from 63.2 to 92.0% for the same dataset. These results show that our algorithm is one of the most accurate methods to identify TIS of prokaryotic genomes. AVAILABILITY: The program MED-Start can be accessed through the website of CTB at Peking University: http://ctb.pku.edu.cn/main/SheGroup/MED_Start.htm.  相似文献   

3.

Background

Expressed Sequence Tag (EST) sequences are generally single-strand, single-pass sequences, only 200–600 nucleotides long, contain errors resulting in frame shifts, and represent different parts of their parent cDNA. If the cDNAs contain translation initiation sites, they may be suitable for functional genomics studies. We have compared five methods to predict translation initiation sites in EST data: first-ATG, ESTScan, Diogenes, Netstart, and ATGpr.

Results

A dataset of 100 EST sequences, 50 with and 50 without, translation initiation sites, was created. Based on analysis of this dataset, ATGpr is found to be the most accurate for predicting the presence versus absence of translation initiation sites. With a maximum accuracy of 76%, ATGpr more accurately predicts the position or absence of translation initiation sites than NetStart (57%) or Diogenes (50%). ATGpr similarly excels when start sites are known to be present (90%), whereas NetStart achieves only 60% overall accuracy. As a baseline for comparison, choosing the first ATG correctly identifies the translation initiation site in 74% of the sequences. ESTScan and Diogenes, consistent with their intended use, are able to identify open reading frames, but are unable to determine the precise position of translation initiation sites.

Conclusions

ATGpr demonstrates high sensitivity, specificity, and overall accuracy in identifying start sites while also rejecting incomplete sequences. A database of EST sequences suitable for validating programs for translation initiation site prediction is now available. These tools and materials may open an avenue for future improvements in start site prediction and EST analysis.
  相似文献   

4.
Classical model of prokaryotic translation initiation based on the central role of interactions between mRNA and 16S rRNA was proposed more than 30 years ago by Shine and Dalgarno. Since then, due to the rapid progress in genome sequencing and to novel technical approaches, basic researches have substantially enriched our knowledge on the problem. The present review focuses on the bioinformatic data as well as on experimental results obtained in vivo and in vitro, which show the diversity of molecular mechanisms for ribosome recruitment in prokaryotes.  相似文献   

5.
More than 30 years ago Shine and Dalgarno proposed a classic model of prokaryotic translation initiation, based on the central role of the mRNA-16S rRNA interactions. Since then basic research has greatly extended the view of this process, owing to rapid progress in experimental techniques and genome sequencing. This review focuses on bioinformatic data and experimental results obtained in vitro and in vivo, demonstrating the diversity of molecular mechanisms for ribosome recruitment in prokaryotes.  相似文献   

6.
SUMMARY: We provide the tool 'TICO' (Translation Initiation site COrrection) for improving the results of conventional gene finders for prokaryotic genomes with regard to exact localization of the translation initiation site (TIS). At the current state TICO provides an interface for direct post processing of the predictions obtained from the widely used program GLIMMER. Our program is based on a clustering algorithm for completely unsupervised scoring of potential TIS locations. AVAILABILITY: Our tool can be freely accessed through a web interface at http://tico.gobics.de/ CONTACT: maike@gobics.de  相似文献   

7.
Toxin/antitoxin (TA) systems, viewed as essential regulators of growth arrest and programmed cell death, are widespread among prokaryotes, but remain sparsely annotated. We present RASTA-Bacteria, an automated method allowing quick and reliable identification of TA loci in sequenced prokaryotic genomes, whether they are annotated open reading frames or not. The tool successfully confirmed all reported TA systems, and spotted new putative loci upon screening of sequenced genomes. RASTA-Bacteria is publicly available at .  相似文献   

8.

Background

Shine-Dalgarno (SD) signal has long been viewed as the dominant translation initiation signal in prokaryotes. Recently, leaderless genes, which lack 5'-untranslated regions (5'-UTR) on their mRNAs, have been shown abundant in archaea. However, current large-scale in silico analyses on initiation mechanisms in bacteria are mainly based on the SD-led initiation way, other than the leaderless one. The study of leaderless genes in bacteria remains open, which causes uncertain understanding of translation initiation mechanisms for prokaryotes.

Results

Here, we study signals in translation initiation regions of all genes over 953 bacterial and 72 archaeal genomes, then make an effort to construct an evolutionary scenario in view of leaderless genes in bacteria. With an algorithm designed to identify multi-signal in upstream regions of genes for a genome, we classify all genes into SD-led, TA-led and atypical genes according to the category of the most probable signal in their upstream sequences. Particularly, occurrence of TA-like signals about 10 bp upstream to translation initiation site (TIS) in bacteria most probably means leaderless genes.

Conclusions

Our analysis reveals that leaderless genes are totally widespread, although not dominant, in a variety of bacteria. Especially for Actinobacteria and Deinococcus-Thermus, more than twenty percent of genes are leaderless. Analyzed in closely related bacterial genomes, our results imply that the change of translation initiation mechanisms, which happens between the genes deriving from a common ancestor, is linearly dependent on the phylogenetic relationship. Analysis on the macroevolution of leaderless genes further shows that the proportion of leaderless genes in bacteria has a decreasing trend in evolution.
  相似文献   

9.
Feature selection for the prediction of translation initiation sites   总被引:3,自引:0,他引:3  
Translation initiation sites (TISs) are important signals in cDNA sequences. In many previous attempts to predict TISs in cDNA sequences, three major factors affect the prediction performance: the nature of the cDNA sequence sets, the relevant features selected. and the classification methods used. In this paper, we examine different approaches to select and integrate relevant features for TIS prediction. The top selected significant features include the features from the position weight matrix and the propensity matrix, the number of nucleotide C in the sequence downstream ATG, the number of downstream stop codons. the number of upstream ATGs, and the number of some amino acids, such as amino acids A and D. With the numerical data generated from these features, different classification methods, including decision tree. naive Bayes, and support vector machine, were applied to three independent sequence sets. The identified significant features were found to be biologically meaningful. while the experiments showed promising results.  相似文献   

10.
We describe a new method for identifying the sequences that signal the start of translation, and the boundaries between exons and introns (donor and acceptor sites) in human mRNA. According to the mandatory keyword, ORGANISM, and feature key, CDS, a large set of standard data for each signal site was extracted from the ASCII flat file, gbpri.seq, in the GenBank release 108.0. This was used to generate the scoring matrices, which summarize the sequence information for each signal site. The scoring matrices take into account the independent nucleotide frequencies between adjacent bases in each position within the signal site regions, and the relative weight on each nucleotide in proportion to their probabilities in the known signal sites. Using a scoring scheme that is based on the nucleotide scoring matrices, the method has great sensitivity and specificity when used to locate signals in uncharacterized human genomic DNA. These matrices are especially effective at distinguishing true and false sites.  相似文献   

11.
12.
Local secondary structures in coding sequences have important functions across various translational processes. To date, however, the local structures and their functions in the early stage of translation elongation remain poorly understood. Here, we surveyed the structural stability in the first 180 nucleotides of the coding sequence of 27 species using computational method. We found that the structural stability in the 30–80 nucleotide interval was significantly higher than that in other regions in eukaryotes and most prokaryotes. No significant correlation between local translation efficiency and structural stability was observed, suggesting that this structural region has undergone selection pressure directly to maintain high stability. Furthermore, ribosome was blocked by this region, providing an opportunity for co-translational regulation. Remarkably, in eukaryotes, we found that mRNAs with higher structural stability in the 30–80 nucleotide interval tended to encode the secreted proteins. Overall, our results revealed a previously unappreciated correlation between structural stability and protein localization.  相似文献   

13.
F Rodier  J Sallantin 《Biochimie》1985,67(5):533-539
Learning processes are applied to the recognition of protein coding regions in prokaryotes. Non-contradictory, statistical and logical rules are deduced from a set of known examples of coding sequences. These rules enable to build characteristic patterns on the m-RNA upstream of the initiating codon. These rules are applied with success to recognize more than 180 coding sequences and to detect and/or eliminate hypothetical reading frames or unknown genes.  相似文献   

14.
15.
Translation is a key process for gene expression. Timely identification of the translation initiation site (TIS) is very important for conducting in-depth genome analysis. With the avalanche of genome sequences generated in the postgenomic age, it is highly desirable to develop automated methods for rapidly and effectively identifying TIS. Although some computational methods were proposed in this regard, none of them considered the global or long-range sequence-order effects of DNA, and hence their prediction quality was limited. To count this kind of effects, a new predictor, called “iTIS-PseTNC,” was developed by incorporating the physicochemical properties into the pseudo trinucleotide composition, quite similar to the PseAAC (pseudo amino acid composition) approach widely used in computational proteomics. It was observed by the rigorous cross-validation test on the benchmark dataset that the overall success rate achieved by the new predictor in identifying TIS locations was over 97%. As a web server, iTIS-PseTNC is freely accessible at http://lin.uestc.edu.cn/server/iTIS-PseTNC. To maximize the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web server to obtain the desired results without the need to go through detailed mathematical equations, which are presented in this paper just for the integrity of the new prection method.  相似文献   

16.
17.
Initiation of mRNA translation in prokaryotes   总被引:57,自引:0,他引:57  
C O Gualerzi  C L Pon 《Biochemistry》1990,29(25):5881-5889
  相似文献   

18.
Poliovirus translation: a paradigm for a novel initiation mechanism   总被引:7,自引:0,他引:7  
All eukaryotic cellular mRNAs, and most viral mRNAs, are blocked at their 5' ends with a cap structure (m7GpppX, where X is any nucleotide). Poliovirus, along with a small number of other animal and plant viral mRNAs, does not contain a 5' cap structure. Since the cap structure functions to facilitate ribosome binding to mRNA, translation of polio-virus must proceed by a cap-independent mechanism. Consistent with this, recent studies have shown that ribosomes can bind to an internal region within the long 5' noncoding sequence of poliovirus RNA. Possible mechanisms for cap-independent translation are discussed. Cap-independent translation of poliovirus RNA is of major importance to the mechanism of shut-off of host protein synthesis after infection. Moreover, it is likely to play a role in determining poliovirus neurovirulence and attenuation.  相似文献   

19.
The ever growing number of completely sequenced prokaryotic genomes facilitates cross-species comparisons by genomic annotation algorithms. This paper introduces a new probabilistic framework for comparative genomic analysis and demonstrates its utility in the context of improving the accuracy of prokaryotic gene start site detection. Our frame work employs a product hidden Markov model (PROD-HMM) with state architecture to model the species-specific trinucleotide frequency patterns in sequences immediately upstream and downstream of a translation start site and to detect the contrasting non-synonymous (amino acid changing) and synonymous (silent) substitution rates that differentiate prokaryotic coding from intergenic regions. Depending on the intricacy of the features modeled by the hidden state architecture, intergenic, regulatory, promoter and coding regions can be delimited by this method. The new system is evaluated using a preliminary set of orthologous Pyrococcus gene pairs, for which it demonstrates an improved accuracy of detection. Its robustness is confirmed by analysis with cross-validation of an experimentally verified set of Escherichia coli K-12 and Salmonella thyphimurium LT2 orthologs. The novel architecture has a number of attractive features that distinguish it from previous comparative models such as pair-HMMs.  相似文献   

20.
MOTIVATION: It is well accepted that the 3' end of 16S rRNA is directly involved in prokaryotic translation initiation by pairing with the Shine-Dalgarno (SD) sequence, which is located in the ribosome-binding site of mRNA. According to Shine and Dalgarno, Escherichia coli 's 5' UTR has the pattern of 'AGGAGG' (SD sequence), which is complementary to the 3' end sequence of 16S rRNA. In this work, we systematically calculated free-energy values of the base pairing between the 3' end of 16S rRNA and the 5' UTR of mRNA, in order to analyze the base-pairing potentials in various prokaryotes. The free-energy values were then plotted over distances from the start codon to visualize the free-energy pattern of 5'UTRs. RESULTS: The average free-energy values fell sharply before the start codon in E. coli, which is consistent with the model that the 3' end of 16S rRNA base pairs with the SD sequence. Haemophilus influenzae, Bacillus subtilis and Helicobacter pylori show a similar pattern, suggesting that the organisms have basically the same mechanism of translation initiation as E. coli. Other eubacteria, such as Synechocystis PCC6803, Mycoplasma genitalium, Mycoplasma pneumoniae and Borrelia burgdorferi also show decreases in their free-energy values, although they are less evident. We also did the same analysis with a eukaryote genome as a control; no fall in free-energy values was observed between the 3' end of 18S rRNA and 5' UTRs of Saccharomyces cerevisiae, suggesting that this organism does not base pair in translation initiation. The three archaebacteria A. fulgidus, M. jannaschii and M. thermoautotrophicum show patterns similar to eubacteria, but not to S. cerevisiae, indicating that archaebacteria are closer to eubacteria than to eukaryotes with respect to the mechanism of translation initiation. From these observations, it appears that the shape of the curve produced by the algorithm can be used to predict the mechanism of translation initiation. AVAILABILITY: The C programs used in our analysis are available upon request.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号