首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
D Frishman  A Mironov  M Gelfand 《Gene》1999,234(2):257-265
Exact mapping of gene starts is an important problem in the computer-assisted functional analysis of newly sequenced prokaryotic genomes. We describe an algorithm for finding ribosomal binding sites without a learning sample. This algorithm is particularly useful for analysis of genomes with little or no experimentally mapped genes. There is a clear correlation between the ribosomal binding site (RBS) properties of a given genome and the potential gene start prediction accuracy. This correlation is of considerable predictive power and may be useful for estimating the expected success of future genome analysis efforts. We also demonstrate that the RBS properties depend on the phylogenetic position of a genome.  相似文献   

2.
Prophage loci often remain under-annotated or even unrecognized in prokaryotic genome sequencing projects. A PHP application, Prophage Finder, has been developed and implemented to predict prophage loci, based upon clusters of phage-related gene products encoded within DNA sequences. This application provides results detailing several facets of these clusters to facilitate rapid prediction and analysis of prophage sequences. Prophage Finder was tested using previously annotated prokaryotic genomic sequences with manually curated prophage loci as benchmarks. Additional analyses from Prophage Finder searches of several draft prokaryotic genome sequences are available through the Web site (http://bioinformatics.uwp.edu/~phage/DOEResults.php) to illustrate the potential of this application.  相似文献   

3.
Although non-coding RNA (ncRNA) genes do not encode proteins, they play vital roles in cells by producing functionally important RNAs. In this paper, we present a novel method for predicting ncRNA genes based on compositional features extracted directly from gene sequences. Our method consists of two Support Vector Machine (SVM) models--Codon model which uses codon usage features derived from ncRNA genes and protein-coding genes and Kmer model which utilizes features of nucleotide and dinucleotide frequency extracted respectively from ncRNA genes and randomly chosen genome sequences. The 10-fold cross-validation accuracy for the two models is found to be 92% and 91%, respectively. Thus, we could make an automatic prediction of ncRNA genes in one genome without manual filtration of protein-coding genes. After applying our method in Sulfolobus solfataricus genome, 25 prediction results have been generated according to 25 cut-off pairs. We have also applied the approach in E. coli and found our results comparable to those of previous studies. In general, our method enables automatic identification of ncRNA genes in newly sequenced prokaryotic genomes.  相似文献   

4.
5.
Within the early region of bacteriophage T7 three genes, 0.3, 1 and 1.3, are most efficiently expressed. They belong to the strongest initiation signals of Escherichia coli. In the T7 wild-type situation the proteins are produced with a molar ratio of gene 1:1.3:0.3 protein = 1:3.9:9.7. DNA fragments of about 30 base pairs comprising the ribosomal binding sites (RBS) of these genes were synthesized and cloned into derivatives of the pDS1 vector ribosomal binding sites (RBS) of these genes were synthesized and cloned into two derivatives of the pDS1 vector just upstream of the mouse dihydrofolate reductase gene. Although all tested RBS fragments contained an initiation triplet, a Shine-Dalgarno sequence and some nucleotides upstream and downstream of this region, only the gene 1.3 RBS fragment showed high efficiency whereas those of genes 0.3 and 1 were at the border of significance. The amount of synthesized mRNA was about the same for all three constructs. A major influence of vector-derived sequences on the RBS activity could be ruled out. The high translational activity of the short 1.3 gene RBS seems to be largely due to its primary structure. The other two RBSs studied require much longer sequences for high activity.  相似文献   

6.
Temperate phages have the ability to maintain their genome in their host, a process called lysogeny. For most, passive replication of the phage genome relies on integration into the host''s chromosome and becoming a prophage. Prophages remain silent in the absence of stress and replicate passively within their host genome. However, when stressful conditions occur, a prophage excises itself and resumes the viral cycle. Integration and excision of phage genomes are mediated by regulated site-specific recombination catalyzed by tyrosine and serine recombinases. In the KplE1 prophage, site-specific recombination is mediated by the IntS integrase and the TorI recombination directionality factor (RDF). We previously described a sub-family of temperate phages that is characterized by an unusual organization of the recombination module. Consequently, the attL recombination region overlaps with the integrase promoter, and the integrase and RDF genes do not share a common activated promoter upon lytic induction as in the lambda prophage. In this study, we show that the intS gene is tightly regulated by its own product as well as by the TorI RDF protein. In silico analysis revealed that overlap of the attL region with the integrase promoter is widely encountered in prophages present in prokaryotic genomes, suggesting a general occurrence of negatively autoregulated integrase genes. The prediction that these integrase genes are negatively autoregulated was biologically assessed by studying the regulation of several integrase genes from two different Escherichia coli strains. Our results suggest that the majority of tRNA-associated integrase genes in prokaryotic genomes could be autoregulated and that this might be correlated with the recombination efficiency as in KplE1. The consequences of this unprecedented regulation for excisive recombination are discussed.  相似文献   

7.
We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms.  相似文献   

8.
9.
Advances in genome sequencing have produced hundreds of thousands of bacterial genome sequences, many of which have integrated prophages derived from temperate bacteriophages. These prophages play key roles by influencing bacterial metabolism, pathogenicity, antibiotic resistance, and defense against viral attack. However, they vary considerably even among related bacterial strains, and they are challenging to identify computationally and to extract precisely for comparative genomic analyses. Here, we describe DEPhT, a multimodal tool for prophage discovery and extraction. It has three run modes that facilitate rapid screening of large numbers of bacterial genomes, precise extraction of prophage sequences, and prophage annotation. DEPhT uses genomic architectural features that discriminate between phage and bacterial sequences for efficient prophage discovery, and targeted homology searches for precise prophage extraction. DEPhT is designed for prophage discovery in Mycobacterium genomes but can be adapted broadly to other bacteria. We deploy DEPhT to demonstrate that prophages are prevalent in Mycobacterium strains but are absent not only from the few well-characterized Mycobacterium tuberculosis strains, but also are absent from all ∼30 000 sequenced M. tuberculosis strains.  相似文献   

10.
11.
Multiple copies of a given ribosomal RNA gene family undergo concerted evolution such that sequences of all gene copies are virtually identical within a species although they diverge normally between species. In eukaryotes, gene conversion and unequal crossing over are the proposed mechanisms for concerted evolution of tandemly repeated sequences, whereas dispersed genes are homogenized by gene conversion. However, the homogenization mechanisms for multiple-copy, normally dispersed, prokaryotic rRNA genes are not well understood. Here we compared the sequences of multiple paralogous rRNA genes within a genome in 12 prokaryotic organisms that have multiple copies of the rRNA genes. Within a genome, putative sequence conversion tracts were found throughout the entire length of each individual rRNA genes and their immediate flanks. Individual conversion events convert only a short sequence tract, and the conversion partners can be any paralogous genes within the genome. Interestingly, the genic sequences undergo much slower divergence than their flanking sequences. Moreover, genomic context and operon organization do not affect rRNA gene homogenization. Thus, gene conversion underlies concerted evolution of bacterial rRNA genes, which normally occurs within genic sequences, and homogenization of flanking regions may result from co-conversion with the genic sequence. Received: 31 March 2000 / Accepted: 15 June 2000  相似文献   

12.
Using data from a partial protein sequence analysis of ribosomal proteins derived from the archaebacterium Methanococcus vannielii, oligonucleotide probes were synthesized. The probes enabled us to localize several ribosomal protein genes and to determine their nucleotide sequences. The amino acid sequences that were deduced from the genes correspond to proteins L12 and L10 from the rif operon, according to the genome organization in Escherichia coli, and to proteins L23 and L2, which have comparable locations, as in the Escherichia coli S10 operon. Various degrees of similarity were found when the four proteins were compared with the corresponding ribosomal proteins of prokaryotic or eukaryotic organisms. The highest sequence homology was found in counterparts from other archaebacteria, such as Halobacterium marismortui, Halobacterium halobium, or Sulfolobus. In general, the M. vannielii protein sequences were more related to the eukaryotic kingdom than to the Gram-positive or Gram-negative eubacteria. On the other hand, the organization of the ribosomal protein genes clearly follows the operon structure of the Escherichia coli genome and is different from the monocistronic eukaryotic gene arrangements. The protein coding regions were not interrupted by introns. Furthermore, the Shine-Dalgarno type sequences of methanogenic bacteria are homologous with those of eubacteria, and also their terminator regions are similar.  相似文献   

13.
Turova TP 《Mikrobiologiia》2003,72(4):437-452
Different aspects of the presence of multiple copies of ribosomal operons in prokaryotic genomes are reviewed. Structure of prokaryotic ribosomal operons is briefly described. The available data are summarized regarding the copy number of ribosomal genes in various prokaryotic genomes, the degree of polymorphism of their individual copies, physiological and evolutionary aspects of the presence of the multiple copies of ribosomal genes. The review also considers the influence of the presence of multiple copies of ribosomal genes on the results of identification of prokaryotic isolates and of the studies of prokaryotic diversity in environmental samples based on phylogenetic analysis of 16S rRNA gene sequences.  相似文献   

14.
Tourova  T. P. 《Microbiology》2003,72(4):389-402
Different aspects of the presence of multiple copies of ribosomal operons in prokaryotic genomes are reviewed. The structure of prokaryotic ribosomal operons is briefly described. The available data are summarized regarding the copy number of ribosomal genes in various prokaryotic genomes, the degree of polymorphism of their individual copies, and physiological and evolutional aspects of the presence of the multiple copies of ribosomal genes. The review also considers the influence of the presence of multiple copies of ribosomal genes on the results of identification of prokaryotic isolates and of the studies of prokaryotic diversity in environmental samples based on phylogenetic analysis of 16S rRNA gene sequences.  相似文献   

15.
Ab initio gene identification in metagenomic sequences   总被引:1,自引:0,他引:1  
We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes.  相似文献   

16.

Background  

Until today, analysis of 16S ribosomal RNA (rRNA) sequences has been the de-facto gold standard for the assessment of phylogenetic relationships among prokaryotes. However, the branching order of the individual phlya is not well-resolved in 16S rRNA-based trees. In search of an improvement, new phylogenetic methods have been developed alongside with the growing availability of complete genome sequences. Unfortunately, only a few genes in prokaryotic genomes qualify as universal phylogenetic markers and almost all of them have a lower information content than the 16S rRNA gene. Therefore, emphasis has been placed on methods that are based on multiple genes or even entire genomes. The concatenation of ribosomal protein sequences is one method which has been ascribed an improved resolution. Since there is neither a comprehensive database for ribosomal protein sequences nor a tool that assists in sequence retrieval and generation of respective input files for phylogenetic reconstruction programs, RibAlign has been developed to fill this gap.  相似文献   

17.
18.
ABSTRACT: BACKGROUND: Gene prediction algorithms (or gene callers) are an essential tool for analyzing shotgun nucleic acid sequence data. Gene prediction is a ubiquitous step in sequence analysis pipelines; it reduces the volume of data by identifying the most likely reading frame for a fragment, permitting the out-of-frame translations to be ignored. In this study we evaluate five widely used ab initio gene-calling algorithms--FragGeneScan, MetaGeneAnnotator, MetaGeneMark, Orphelia, and Prodigal--for accuracy on short (75-1000 bp) fragments containing sequence error from previously published artificial data and "real" metagenomic datasets. RESULTS: While gene prediction tools have similar accuracies predicting genes on error-free fragments, in the presence of sequencing errors considerable differences between tools become evident. For error-containing short reads, FragGeneScan finds more prokaryotic coding regions than does MetaGeneAnnotator, MetaGeneMark, Orphelia, or Prodigal. This improved detection of genes in error-containing fragments, however, comes at the cost of much lower (50%) specificity and overprediction of genes in noncoding regions. CONCLUSIONS: Ab initio gene callers offer a significant reduction in the computational burden of annotating individual nucleic acid reads and are used in many metagenomic annotation systems. For predicting reading frames on raw reads, we find the hidden Markov model approach in FragGeneScan is more sensitive than other gene prediction tools, while Prodigal, MGA, and MGM are better suited for higher-quality sequences such as assembled contigs.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号