首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recent advances in DNA sequencers are accelerating genome sequencing, especially in microbes, and complete and draft genomes from various species have been sequenced in rapid succession. Here, we present a comprehensive gene prediction tool, the MetaGeneAnnotator (MGA), which precisely predicts all kinds of prokaryotic genes from a single or a set of anonymous genomic sequences having a variety of lengths. The MGA integrates statistical models of prophage genes, in addition to those of bacterial and archaeal genes, and also uses a self-training model from input sequences for predictions. As a result, the MGA sensitively detects not only typical genes but also atypical genes, such as horizontally transferred and prophage genes in a prokaryotic genome. In this paper, we also propose a novel approach for analyzing the ribosomal binding site (RBS), which enables us to detect species-specific patterns of the RBSs. The MGA has the ingenious RBS model based on this approach, and precisely predicts translation starts of genes. The MGA also succeeds in improving prediction accuracies for short sequences by using the adapted RBS models (96% sensitivity and 93% specificity for 700 bp fragments). These features of the MGA expedite wide ranges of microbial genome studies, such as genome annotations and metagenome analyses.Key words: bioinformatics, gene-finding, prokaryote, phage, ribosomal binding site  相似文献   

2.
We have replaced the ribosomal binding site (RBS) of the lacZ gene of E. coli by those of the maturation (A) gene of phage MS2 and that of the tufA gene. Both RBSs contain a GUG initiation codon. The expression with the tufA RBS is at least 25-fold higher than with the phage RBS. Changing the GUG into AUG results in a 3-fold increase in expression in both cases. In general, higher expression is accompanied by an increase of lac-specific mRNA. It is argued that this is a consequence of the more efficient translation of the mRNA.  相似文献   

3.
4.
5.
6.
Direct cloning of PCR fragments by TA cloning or blunt end ligation are two simple methods which would greatly benefit high-throughput (HTP) cloning constructions if the efficiency can be improved. In this study, we have developed a ribosomal binding site (RBS) switching strategy for direct cloning of PCR fragments. RBS is an A/G rich region upstream of the translational start codon and is essential for gene expression. Change from A/G to T/C in the RBS blocks its activity and thereby abolishes gene expression. Based on this property, we introduced an inactive RBS upstream of a selectable marker gene, and designed a fragment insertion site within this inactive RBS. Forward and reverse insertions of specifically tailed fragments will respectively form an active and inactive RBS, thus all background from vector self-ligation and fragment reverse insertions will be eliminated due to the non-expression of the marker gene. The effectiveness of our strategy for TA cloning and blunt end ligation are confirmed. Application of this strategy to gene over-expression, a bacterial two-hybrid system, a bacterial one-hybrid system, and promoter bank construction are also verified. The advantages of this simple procedure, together with its low cost and high efficiency, makes our strategy extremely useful in HTP cloning constructions.  相似文献   

7.
Bacterial start site prediction.   总被引:5,自引:1,他引:4       下载免费PDF全文
With the growing number of completely sequenced bacterial genes, accurate gene prediction in bacterial genomes remains an important problem. Although the existing tools predict genes in bacterial genomes with high overall accuracy, their ability to pinpoint the translation start site remains unsatisfactory. In this paper, we present a novel approach to bacterial start site prediction that takes into account multiple features of a potential start site, viz., ribosome binding site (RBS) binding energy, distance of the RBS from the start codon, distance from the beginning of the maximal ORF to the start codon, the start codon itself and the coding/non-coding potential around the start site. Mixed integer programing was used to optimize the discriminatory system. The accuracy of this approach is up to 90%, compared to 70%, using the most common tools in fully automated mode (that is, without expert human post-processing of results). The approach is evaluated using Bacillus subtilis, Escherichia coli and Pyrococcus furiosus. These three genomes cover a broad spectrum of bacterial genomes, since B.subtilis is a Gram-positive bacterium, E.coli is a Gram-negative bacterium and P. furiosus is an archaebacterium. A significant problem is generating a set of 'true' start sites for algorithm training, in the absence of experimental work. We found that sequence conservation between P. furiosus and the related Pyrococcus horikoshii clearly delimited the gene start in many cases, providing a sufficient training set.  相似文献   

8.
Thousands of proteins make up a chloroplast, but fewer than 100 are encoded by the chloroplast genome. Despite this low number, expression of chloroplast-encoded genes is essential for plant survival. Every chloroplast has its own gene expression system with a major regulatory point at the initiation of protein synthesis (translation). In chloroplasts, most protein-encoding genes contain elements resembling the ribosome binding sites (RBS) found in prokaryotes. In vitro, these putative chloroplast ribosome binding sequences vary in their ability to support translation. Here we report results from an investigation into effects of the predicted RBS for the tobacco chloroplast atpI gene on translation in vivo. Two reporter constructs, differing only in their 5'-untranslated regions (5'UTRs) were stably incorporated into tobacco chloroplast genomes and their expression analyzed. One 5'UTR was derived from the wild-type (WT) atpI gene. The second, Holo-substitution (Holo-sub), had nonchloroplast sequence replacing all wild-type nucleotides, except for the putative RBS. The abundance of reporter RNA was the same for both 5'UTRs. However, translation controlled by Holo-sub was less than 4% that controlled by WT. These in vivo experiments support the idea that translation initiation in land plant chloroplasts depends on 5'UTR elements outside the putative RBS.  相似文献   

9.
Within the early region of bacteriophage T7 three genes, 0.3, 1 and 1.3, are most efficiently expressed. They belong to the strongest initiation signals of Escherichia coli. In the T7 wild-type situation the proteins are produced with a molar ratio of gene 1:1.3:0.3 protein = 1:3.9:9.7. DNA fragments of about 30 base pairs comprising the ribosomal binding sites (RBS) of these genes were synthesized and cloned into derivatives of the pDS1 vector ribosomal binding sites (RBS) of these genes were synthesized and cloned into two derivatives of the pDS1 vector just upstream of the mouse dihydrofolate reductase gene. Although all tested RBS fragments contained an initiation triplet, a Shine-Dalgarno sequence and some nucleotides upstream and downstream of this region, only the gene 1.3 RBS fragment showed high efficiency whereas those of genes 0.3 and 1 were at the border of significance. The amount of synthesized mRNA was about the same for all three constructs. A major influence of vector-derived sequences on the RBS activity could be ruled out. The high translational activity of the short 1.3 gene RBS seems to be largely due to its primary structure. The other two RBSs studied require much longer sequences for high activity.  相似文献   

10.
11.
Microbial genome sequences provide us with the fossil records for inferring their origination and evolution. Assuming that current microbial genomes are the evolutionary results of ancient genomes or fragments and the neighboring genes in ancient genomes are more likely neighbors in current genomes, in this paper we proposed a paleontological algorithm and assembled the orthologous gene groups from 66 complete and current microbial genome sequences into a pseudo-ancient genome, which consists of continuous fragments of various sizes. We performed bootstrap resampling and correlation analyses and the results showed that the assembled ancient genome and fragments are statistically significant and the genes of the same fragment are inherently related and likely derived from common ancestors. This method provides a new computational tool for studying microbial genome structure and evolution.  相似文献   

12.
13.
MBGD is a workbench system for comparative analysis of completely sequenced microbial genomes. The central function of MBGD is to create an orthologous gene classification table using precomputed all-against-all similarity relationships among genes in multiple genomes. In MBGD, an automated classification algorithm has been implemented so that users can create their own classification table by specifying a set of organisms and parameters. This feature is especially useful when the user's interest is focused on some taxonomically related organisms. The created classification table is stored into the database and can be explored combining with the data of individual genomes as well as similarity relationships among genomes. Using these data, users can carry out comparative analyses from various points of view, such as phylogenetic pattern analysis, gene order comparison and detailed gene structure comparison. MBGD is accessible at http://mbgd.genome.ad.jp/.  相似文献   

14.
Using a previously described vector (pKL203) we fused several heterologous ribosomal binding sites (RBSs) to the lacZ gene of E. coli and then studied the variation in expression of the fusions. The RBSs originated from bacteriophage Q beta and MS2 genes and the E. coli genes for elongation factor EF-Tu A and B and ribosomal protein L11 (rplK). The synthesis of the lacZ fusion proteins was measured by an immuno precipitation method and found to vary at least 100-fold. Lac-specific mRNA synthesis follows the variation in protein production. It appears that there is a correlation between the efficiency of an RBS to function in the expression of the fused gene and the lack of secondary structure, involving the Shine and Dalgarno nucleotides (SDnts) and/or the initiation codon. This efficiency is context dependent. The sequence of the SD nts and the length and sequence of the spacer region up to the initiation codon alone are not able to explain our results. Deletion mutations, created in the phage Q beta replicase RBS, reveal a complex pattern of control of expression, probably involving the use of a "false" initiation site.  相似文献   

15.
Expression of the phi X174 lysis (E) gene, a member of an overlapping gene pair, appears to depend on a frameshift-induced chain termination by ribosomes translating the upstream D gene. A -1 reading frameshift, possibly induced by misreading of an alanine codon as a doublet, causes ribosomes to terminate translation at two different sites, suggesting two modes of regulating expression of the E gene. One frameshift can cause translational termination at a stop codon(s) near the E gene ribosome binding site (RBS), resulting in reinitiation by ribosomes at the E gene RBS. Termination at a second site some 70 bases upstream from the E gene RBS, while too far away to allow ribosomal re-initiation at the E gene RBS, probably results in an unmasking of the message, allowing entry of a new ribosome at the E gene RBS.  相似文献   

16.
We previously reported two graph algorithms for analysis of genomic information: a graph comparison algorithm to detect locally similar regions called correlated clusters and an algorithm to find a graph feature called P-quasi complete linkage. Based on these algorithms we have developed an automatic procedure to detect conserved gene clusters and align orthologous gene orders in multiple genomes. In the first step, the graph comparison is applied to pairwise genome comparisons, where the genome is considered as a one-dimensionally connected graph with genes as its nodes, and correlated clusters of genes that share sequence similarities are identified. In the next step, the P-quasi complete linkage analysis is applied to grouping of related clusters and conserved gene clusters in multiple genomes are identified. In the last step, orthologous relations of genes are established among each conserved cluster. We analyzed 17 completely sequenced microbial genomes and obtained 2313 clusters when the completeness parameter P was 40%. About one quarter contained at least two genes that appeared in the metabolic and regulatory pathways in the KEGG database. This collection of conserved gene clusters is used to refine and augment ortholog group tables in KEGG and also to define ortholog identifiers as an extension of EC numbers.  相似文献   

17.
Analysis of evolution of paralogous genes in a genome is central to our understanding of genome evolution. Comparison of closely related bacterial genomes, which has provided clues as to how genome sequences evolve under natural conditions, would help in such an analysis. With species Staphylococcus aureus, whole-genome sequences have been decoded for seven strains. We compared their DNA sequences to detect large genome polymorphisms and to deduce mechanisms of genome rearrangements that have formed each of them. We first compared strains N315 and Mu50, which make one of the most closely related strain pairs, at the single-nucleotide resolution to catalogue all the middle-sized (more than 10 bp) to large genome polymorphisms such as indels and substitutions. These polymorphisms include two paralogous gene sets, one in a tandem paralogue gene cluster for toxins in a genomic island and the other in a ribosomal RNA operon. We also focused on two other tandem paralogue gene clusters and type I restriction-modification (RM) genes on the genomic islands. Then we reconstructed rearrangement events responsible for these polymorphisms, in the paralogous genes and the others, with reference to the other five genomes. For the tandem paralogue gene clusters, we were able to infer sequences for homologous recombination generating the change in the repeat number. These sequences were conserved among the repeated paralogous units likely because of their functional importance. The sequence specificity (S) subunit of type I RM systems showed recombination, likely at the homology of a conserved region, between the two variable regions for sequence specificity. We also noticed novel alleles in the ribosomal RNA operons and suggested a role for illegitimate recombination in their formation. These results revealed importance of recombination involving long conserved sequence in the evolution of paralogous genes in the genome.  相似文献   

18.
We sequenced most of the mitochondrial (mt) genomes of 2 apocritan taxa: Vanhornia eucnemidarum and Primeuchroeus spp. These mt genomes have similar nucleotide composition and codon usage to those of mt genomes reported for other Hymenoptera, with a total A + T content of 80.1% and 78.2%, respectively. Gene content corresponds to that of other metazoan mt genomes, but gene organization is not conserved. There are a total of 6 tRNA genes rearranged in V. eucnemidarum and 9 in Primeuchroeus spp. Additionally, several noncoding regions were found in the mt genome of V. eucnemidarum, as well as evidence of a sustained gene duplication involving 3 tRNA genes. We also report an inversion of the large and small ribosomal RNA genes in Primeuchroeus spp. mt genome. However, none of the rearrangements reported are phylogenetically informative with respect to the current taxon sample.  相似文献   

19.
RNAmmer: consistent and rapid annotation of ribosomal RNA genes   总被引:7,自引:0,他引:7  
The publication of a complete genome sequence is usually accompanied by annotations of its genes. In contrast to protein coding genes, genes for ribosomal RNA (rRNA) are often poorly or inconsistently annotated. This makes comparative studies based on rRNA genes difficult. We have therefore created computational predictors for the major rRNA species from all kingdoms of life and compiled them into a program called RNAmmer. The program uses hidden Markov models trained on data from the 5S ribosomal RNA database and the European ribosomal RNA database project. A pre-screening step makes the method fast with little loss of sensitivity, enabling the analysis of a complete bacterial genome in less than a minute. Results from running RNAmmer on a large set of genomes indicate that the location of rRNAs can be predicted with a very high level of accuracy. Novel, unannotated rRNAs are also predicted in many genomes. The software as well as the genome analysis results are available at the CBS web server.  相似文献   

20.
Given two genomes with duplicate genes, Zero Exemplar Distance is the problem of deciding whether the two genomes can be reduced to the same genome without duplicate genes by deleting all but one copy of each gene in each genome. Blin, Fertin, Sikora, and Vialette recently proved that Zero Exemplar Distance for monochromosomal genomes is NP-hard even if each gene appears at most two times in each genome, thereby settling an important open question on genome rearrangement in the exemplar model. In this article, we give a very simple alternative proof of this result. We also study the problem Zero Exemplar Distance for multichromosomal genomes without gene order, and prove the analogous result that it is also NP-hard even if each gene appears at most two times in each genome. For the positive direction, we show that both variants of Zero Exemplar Distance admit polynomial-time algorithms if each gene appears exactly once in one genome and at least once in the other genome. In addition, we present a polynomial-time algorithm for the related problem Exemplar Longest Common Subsequence in the special case that each mandatory symbol appears exactly once in one input sequence and at least once in the other input sequence. This answers an open question of Bonizzoni et al. We also show that Zero Exemplar Distance for multichromosomal genomes without gene order is fixed-parameter tractable in the general case if the parameter is the maximum number of chromosomes in each genome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号