首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
MOTIVATION: At present the computational gene identification methods in microbial genomes have a high prediction accuracy of verified translation termination site (3' end), but a much lower accuracy of the translation initiation site (TIS, 5' end). The latter is important to the analysis and the understanding of the putative protein of a gene and the regulatory machinery of the translation. Improving the accuracy of prediction of TIS is one of the remaining open problems. RESULTS: In this paper, we develop a four-component statistical model to describe the TIS of prokaryotic genes. The model incorporates several features with biological meanings, including the correlation between translation termination site and TIS of genes, the sequence content around the start codon; the sequence content of the consensus signal related to ribosomal binding sites (RBSs), and the correlation between TIS and the upstream consensus signal. An entirely non-supervised training system is constructed, which takes as input a set of annotated coding open reading frames (ORFs) by any gene finder, and gives as output a set of organism-specific parameters (without any prior knowledge or empirical constants and formulas). The novel algorithm is tested on a set of reliable datasets of genes from Escherichia coli and Bacillus subtillis. MED-Start may correctly predict 95.4% of the start sites of 195 experimentally confirmed E.coli genes, 96.6% of 58 reliable B.subtillis genes. Moreover, the test results indicate that the algorithm gives higher accuracy for more reliable datasets, and is robust to the variation of gene length. MED-Start may be used as a postprocessor for a gene finder. After processing by our program, the improvement of gene start prediction of gene finder system is remarkable, e.g. the accuracy of TIS predicted by MED 1.0 increases from 61.7 to 91.5% for 854 E.coli verified genes, while that by GLIMMER 2.02 increases from 63.2 to 92.0% for the same dataset. These results show that our algorithm is one of the most accurate methods to identify TIS of prokaryotic genomes. AVAILABILITY: The program MED-Start can be accessed through the website of CTB at Peking University: http://ctb.pku.edu.cn/main/SheGroup/MED_Start.htm.  相似文献   

2.
Liu H  Han H  Li J  Wong L 《In silico biology》2004,4(3):255-269
The translation initiation site (TIS) prediction problem is about how to correctly identify TIS in mRNA, cDNA, or other types of genomic sequences. High prediction accuracy can be helpful in a better understanding of protein coding from nucleotide sequences. This is an important step in genomic analysis to determine protein coding from nucleotide sequences. In this paper, we present an in silico method to predict translation initiation sites in vertebrate cDNA or mRNA sequences. This method consists of three sequential steps as follows. In the first step, candidate features are generated using k-gram amino acid patterns. In the second step, a small number of top-ranked features are selected by an entropy-based algorithm. In the third step, a classification model is built to recognize true TISs by applying support vector machines or ensembles of decision trees to the selected features. We have tested our method on several independent data sets, including two public ones and our own extracted sequences. The experimental results achieved are better than those reported previously using the same data sets. Our high accuracy not only demonstrates the feasibility of our method, but also indicates that there might be "amino acid" patterns around TIS in cDNA and mRNA sequences.  相似文献   

3.
Characterization of translational initiation sites in E. coli.   总被引:104,自引:34,他引:104       下载免费PDF全文
We characterize the Shine and Dalgarno sequence of 124 known gene beginnings. This information is used to make "rules" which help distinguish gene beginning from other sites in a library of over 78,000 bases of mRNA. Gene beginnings are found to have information besides the initiation codon and Shine and Dalgarno sequence which can be used to make better "rules".  相似文献   

4.
We have used a "Perceptron" algorithm to find a weighting function which distinguishes E. coli translational initiation sites from all other sites in a library of over 78,000 nucleotides of mRNA sequence. The "Perceptron" examined sequences as linear representations. The "Perceptron" is more successful at finding gene beginnings than our previous searches using "rules" (see previous paper). We note that the weighting function can find translational initiation sites within sequences that were not included in the training set.  相似文献   

5.
Initiation Factor 1 (IF1) is required for the initiation of translation in Escherichia coli. However, the precise function of IF1 remains unknown. Current evidence suggests that IF1 is an RNA-binding protein that sits in the A site of the decoding region of 16 S rRNA. IF1 binding to 30 S subunits changes the reactivity of nucleotides in the A site to chemical probes. The N1 position of A1408 is enhanced, while the N1 positions of A1492 and A1493 are protected from reactivity with dimethyl sulfate (DMS). The N1-N2 positions of G530 are also protected from reactivity with kethoxal. Quantitative footprinting experiments show that the dissociation constant for IF1 binding to the 30 S subunit is 0.9 microM and that IF1 also alters the reactivity of a subset of Class III sites that are protected by tRNA, 50 S subunits, or aminoglycoside antibiotics. IF1 enhances the reactivity of the N1 position of A1413, A908, and A909 to DMS and the N1-N2 positions of G1487 to kethoxal. To characterize this RNA-protein interaction, several ribosomal mutants in the decoding region RNA were created, and IF1 binding to wild-type and mutant 30 S subunits was monitored by chemical modification and primer extension with allele-specific primers. The mutations C1407U, A1408G, A1492G, or A1493G disrupt IF1 binding to 30 S subunits, whereas the mutations G530A, U1406A, U1406G, G1491U, U1495A, U1495C, or U1495G had little effect on IF1 binding. Disruption of IF1 binding correlates with the deleterious phenotypic effects of certain mutations. IF1 binding to the A site of the 30 S subunit may modulate subunit association and the fidelity of tRNA selection in the P site through conformational changes in the 16 S rRNA.  相似文献   

6.
7.
8.
9.
10.
Jon Beckwith 《Cell》1981,23(2):307-308
The chromosomal distributions of five families of mouse r-protein genes (S16, L18, L19, L30 and L32/33) were studied by Southern blot analysis of DNA from a panel of mouse-hamster hybrid cells containing various complements of mouse chromosomes. Our results indicated that members of a particular family are often located on more than one chromosome, that extensive clustering of many r-protein gene families on a few chromosomes is unlikely, and that there is no obligatory linkage of r-protein and rRNA genes.  相似文献   

11.
Feature selection for the prediction of translation initiation sites   总被引:3,自引:0,他引:3  
Translation initiation sites (TISs) are important signals in cDNA sequences. In many previous attempts to predict TISs in cDNA sequences, three major factors affect the prediction performance: the nature of the cDNA sequence sets, the relevant features selected. and the classification methods used. In this paper, we examine different approaches to select and integrate relevant features for TIS prediction. The top selected significant features include the features from the position weight matrix and the propensity matrix, the number of nucleotide C in the sequence downstream ATG, the number of downstream stop codons. the number of upstream ATGs, and the number of some amino acids, such as amino acids A and D. With the numerical data generated from these features, different classification methods, including decision tree. naive Bayes, and support vector machine, were applied to three independent sequence sets. The identified significant features were found to be biologically meaningful. while the experiments showed promising results.  相似文献   

12.
The amino acid sequence of E.coli UDP-galactose 4-epimerase has been determined through the amino-terminal 28-amino acid residues using an automated protein sequenator. Alignment of UDP-galactose operon messenger RNA and the amino acid sequence of epimerase demonstrates that the first 26 bases in the mRNA are transcribed but do not take part in translation of epimerase.  相似文献   

13.
In a genetic selection designed to isolate Escherichia coli mutations that increase expression of the IS 10 transposase gene ( tnp ), we unexpectedly obtained viable mutants defective in translation initiation factor 3 (IF3). Several lines of evidence led us to conclude that transposase expression, per se , was not increased. Rather, these mutations appear to increase expression of the tnp'–'lacZ gene fusions used in this screen, by increasing translation initiation at downstream, atypical initiation codons. To test this hypothesis we undertook a systematic analysis of start codon requirements and measured the effects of IF3 mutations on initiation from various start codons. Beginning with an efficient translation initiation site, we varied the AUG start codon to all possible codons that differed from AUG by one nucleotide. These potential start codons fall into distinct classes with regard to translation efficiency in vivo : Class I codons (AUG, GUG, and UUG) support efficient translation; Class IIA codons (CUG, AUU, AUC, AUA, and ACG) support translation at levels only 1–3% that of AUG; and Class IIB codons (AGG and AAG) permit levels of translation too low for reliable quantification. Importantly, the IF3 mutations had no effect on translation from Class I codons, but they increased translation from Class II codons 3–5-fold, and this same effect was seen in other gene contexts. Therefore, IF3 is generally able to discriminate between efficient and inefficient codons in vivo , consistent with earlier in vitro observations. We discuss these observations as they relate to IF3 autoregulation and the mechanism of IF3 function.  相似文献   

14.
Hou S  Chen X  Wang H  Tao M  Hu Z 《BioTechniques》2002,32(4):783-4, 786, 788
Here we describe a convenient method to generate homologous recombinant baculoviral genomes in E. coli. The recombination takes place with the aid of recombination enzymes provided by the phage lambda Red system between a bacmid (a baculoviral genome that can replicate in bacteria) and a linear fragment. Proof of concept was provided when the cathepsin gene (v-cath) of the Helicoverpa armigera single nucleocapsid nucleopolyhedrovirus (HaSNPV) was replaced by the chloramphenicol resistance gene (CmR). First, CmR was inserted between the flanking sequences of the HaS-NPV v-cath. Each of the flanking regions was about 1 kb. The fragment was linearized and electroporated into bacteria containing both the HaSNPV bacmid and the lambda Red system. Recombinant bacmids resistant to chloramphenicol were selected. In comparison to the standard co-transfection/plaque assays, this method significantly reduces the time required to construct baculovirus knockout mutants. It may also be useful in the manipulation of other large viral genomes.  相似文献   

15.
16.
Influence of mRNA determinants on translation initiation in Escherichia coli.   总被引:11,自引:0,他引:11  
We have studied the classic initiation elements of mRNA sequence and structure to better understand their influence on translation initiation rates in Escherichia coli. Changes introduced in the initiation codon, the Shine and Dalgarno sequence, the spacing between those two elements, and in the secondary structures within initiation domains each change the rate of 30 S ternary complex formation. We measured these differences using extension inhibition analysis, a technique we have called "toeprinting". The rate of 30 S initiation complex formation in the absence of initiation factors agrees well with in vivo translation rates in some instances, although in others a regulatory role of initiation factors in 30 S complex formation is likely. Nucleotides 5' to the Shine and Dalgarno domain facilitate ternary complex formation.  相似文献   

17.
18.
Understanding regulatory mechanisms of protein synthesis in eukaryotes is essential for the accurate annotation of genome sequences. Kozak reported that the nucleotide sequence GCCGCC(A/G)CCAUGG (AUG is the initiation codon) was frequently observed in vertebrate genes and that this 'consensus' sequence enhanced translation initiation. However, later studies using invertebrate, fungal and plant genes reported different 'consensus' sequences. In this study, we conducted extensive comparative analyses of nucleotide sequences around the initiation codon by using genomic data from 47 eukaryote species including animals, fungi, plants and protists. The analyses revealed that preferred nucleotide sequences are quite diverse among different species, but differences between patterns of nucleotide bias roughly reflect the evolutionary relationships of the species. We also found strong biases of A/G at position -3, A/C at position -2 and C at position +5 that were commonly observed in all species examined. Genes with higher expression levels showed stronger signals, suggesting that these nucleotides are responsible for the regulation of translation initiation. The diversity of preferred nucleotide sequences around the initiation codon might be explained by differences in relative contributions from two distinct patterns, GCCGCCAUG and AAAAAAAUG, which implies the presence of multiple molecular mechanisms for controlling translation initiation.  相似文献   

19.
The region located downstream of the initiation codon constitutes part of the translation initiation signal, significantly affecting the level of protein expression in E. coli. In order to determine its influence on translation initiation, we inserted random 12-base sequences downstream of the initiation codon of the lacZ gene. A total of 119 random clones showing higher beta-galactosidase activities than the control lacZ gene were isolated and subsequently sequenced. Analysis of these clones revealed that their insertion sequences are strikingly rich in A and T, but poor in G, with no consensus sequences among them. Toeprinting experiments and polysome profile analysis confirmed that the A/T-rich sequences enhance translation at the level of initiation. Collectively, the present data demonstrate that A/T richness of the region following the initiation codon plays a significant role in E. coli gene expression.  相似文献   

20.
S Loechel  J M Inamine    P C Hu 《Nucleic acids research》1991,19(24):6905-6911
The tuf gene of Mycoplasma genitalium uses a signal other than a Shine-Dalgarno sequence to promote translation initiation. We have inserted the translation initiation region of this gene in front of the Escherichia coli lacZ gene and shown that it is recognized by the translational machinery of E. coli; the signal operates in vivo at roughly the same efficiency as a synthetic Shine-Dalgarno sequence. The M. genitalium sequence was also used to replace the native translation initiation region of the cat gene. When assayed in E. coli, the M. genitalium sequence is equivalent to a Shine-Dalgarno sequence in stimulating translation of this mRNA also. Site-directed mutagenesis enabled us to identify some of the bases that comprise the functional sequence. We propose that the sequence UUAACAACAU functions as a ribosome binding site by annealing to nucleotides 1082-1093 of the E. coli 16S rRNA. The activity of this sequence is enhanced when it is present in the loop of a stem-and-loop structure. Additional sequences both upstream and downstream of the initiation codon are also involved, but their role has not been elucidated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号