首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Yan Y  Moult J 《Proteins》2006,64(3):615-628
Operons are clusters of genes that are transcribed as a single message, and regulated by the same gene expression machinery. They are found primarily in prokaryotic genomes. Because genes in the same operon are likely to have related functions, identification of the operon structure is potentially useful for assigning gene function. We report the development and benchmarking of two different methods for detecting operons, based on an analysis of 42 fully sequenced prokaryotic organisms. The Gene Neighbor method (GNM) utilizes the relatively high conservation of gene order in operons, compared with genes in general. The Gene Gap Method (GGM) makes use of the relatively short gap between genes in operons compared with that otherwise found between adjacent genes. The methods have been benchmarked using KEGG pathway data and RegulonDB Escherichia coli operon data. With optimum parameters, the specificity of the GNM is 93% and the sensitivity is 70%. For the GGM, the specificity is 95% and the sensitivity is 68%. Together, the two methods have a sensitivity of 87.2%, while joint predictions have a sensitivity of 50% and a specificity of 98%. The methods are used to infer possible functions for some hypothetical genes in prokaryotic genomes. The methods have proven a useful addition to structure information in deriving protein function in a structural genomics project.  相似文献   

2.
3.
Butler DK  Gillespie D  Steele B 《Genetics》2002,161(3):1065-1075
Large DNA palindromes form sporadically in many eukaryotic and prokaryotic genomes and are often associated with amplified genes. The presence of a short inverted repeat sequence near a DNA double-strand break has been implicated in the formation of large palindromes in a variety of organisms. Previously we have established that in Saccharomyces cerevisiae a linear DNA palindrome is efficiently formed from a single-copy circular plasmid when a DNA double-strand break is introduced next to a short inverted repeat sequence. In this study we address whether the linear palindromes form by an intermolecular reaction (that is, a reaction between two identical fragments in a head-to-head arrangement) or by an unusual intramolecular reaction, as it apparently does in other examples of palindrome formation. Our evidence supports a model in which palindromes are primarily formed by an intermolecular reaction involving homologous recombination of short inverted repeat sequences. We have also extended our investigation into the requirement for DNA double-strand break repair genes in palindrome formation. We have found that a deletion of the RAD52 gene significantly reduces palindrome formation by intermolecular recombination and that deletions of two other genes in the RAD52-epistasis group (RAD51 and MRE11) have little or no effect on palindrome formation. In addition, palindrome formation is dramatically reduced by a deletion of the nucleotide excision repair gene RAD1.  相似文献   

4.
Large-scale prokaryotic gene prediction and comparison to genome annotation   总被引:4,自引:0,他引:4  
MOTIVATION: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome comparison either on a large or small scale would be facilitated by using a single standard for annotation, which incorporates a transparency of why an open reading frame (ORF) is considered to be a gene. RESULTS: A total of 143 prokaryotic genomes were scored with an updated version of the prokaryotic genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to approximately 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms that too many short genes are annotated in numerous organisms. Furthermore, genes might be missing in the annotation of some of the genomes. We predict 41 of 143 genomes to be over-annotated by >5%, meaning that too many ORFs are annotated as genes. We also predict that 12 of 143 genomes are under-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation.  相似文献   

5.
Multiple copies of a given ribosomal RNA gene family undergo concerted evolution such that sequences of all gene copies are virtually identical within a species although they diverge normally between species. In eukaryotes, gene conversion and unequal crossing over are the proposed mechanisms for concerted evolution of tandemly repeated sequences, whereas dispersed genes are homogenized by gene conversion. However, the homogenization mechanisms for multiple-copy, normally dispersed, prokaryotic rRNA genes are not well understood. Here we compared the sequences of multiple paralogous rRNA genes within a genome in 12 prokaryotic organisms that have multiple copies of the rRNA genes. Within a genome, putative sequence conversion tracts were found throughout the entire length of each individual rRNA genes and their immediate flanks. Individual conversion events convert only a short sequence tract, and the conversion partners can be any paralogous genes within the genome. Interestingly, the genic sequences undergo much slower divergence than their flanking sequences. Moreover, genomic context and operon organization do not affect rRNA gene homogenization. Thus, gene conversion underlies concerted evolution of bacterial rRNA genes, which normally occurs within genic sequences, and homogenization of flanking regions may result from co-conversion with the genic sequence. Received: 31 March 2000 / Accepted: 15 June 2000  相似文献   

6.
Current Bayesian microarray models that pool multiple studies assume gene expression is independent of other genes. However, in prokaryotic organisms, genes are arranged in units that are co-regulated (called operons). Here, we introduce a new Bayesian model for pooling gene expression studies that incorporates operon information into the model. Our Bayesian model borrows information from other genes within the same operon to improve estimation of gene expression. The model produces the gene-specific posterior probability of differential expression, which is the basis for inference. We found in simulations and in biological studies that incorporating co-regulation information improves upon the independence model. We assume that each study contains two experimental conditions: a treatment and control. We note that there exist environmental conditions for which genes that are supposed to be transcribed together lose their operon structure, and that our model is best carried out for known operon structures.  相似文献   

7.
Simple sequence repeats (SSRs) in DNA sequences are tandem iterations of a single nucleotide or a short oligonucleotide. SSRs are subject to slipped-strand mutations and a common source of phase variation in bacteria and antigenic variation in pathogens. Significantly long SSRs are generally rare in prokaryotic genomes, and long SSRs composed of iterations of mono-, di-, tri-, and tetranucleotides are mostly restricted to host-adapted pathogens. We present new results concerning associations between long SSRs and genes related to different cellular functions in genomes of host-adapted pathogens. We found that in the majority of the analyzed genomes, at least some of the genes associated with SSRs encode potential antigens, which is expected if the primary function of SSRs is their contribution to antigenic variation. However, we also found a number of long SSRs associated with housekeeping genes, including rRNA and tRNA genes, genes encoding ribosomal proteins, amino acyl-tRNA synthetases, chaperones, and important metabolic enzymes. Many of these genes are probably essential and it is unlikely that they are phase-variable. Few statistically significant associations between SSRs and gene functional classifications were detected, suggesting that most long SSRs are not related to a particular cellular function or process. Long SSRs in Mycobacterium leprae are mostly associated with pseudogenes and may be contributing to gene loss following the adaptation to an obligate pathogenic lifestyle. We speculate that LSSRs may have played a similar role in genome reduction of other host-adapted pathogens.  相似文献   

8.
The extent to which prokaryotic evolution has been influenced by horizontal gene transfer (HGT) and therefore might be more of a network than a tree is unclear. Here we use supertree methods to ask whether a definitive prokaryotic phylogenetic tree exists and whether it can be confidently inferred using orthologous genes. We analysed an 11-taxon dataset spanning the deepest divisions of prokaryotic relationships, a 10-taxon dataset spanning the relatively recent gamma-proteobacteria and a 61-taxon dataset spanning both, using species for which complete genomes are available. Congruence among gene trees spanning deep relationships is not better than random. By contrast, a strong, almost perfect phylogenetic signal exists in gamma-proteobacterial genes. Deep-level prokaryotic relationships are difficult to infer because of signal erosion, systematic bias, hidden paralogy and/or HGT. Our results do not preclude levels of HGT that would be inconsistent with the notion of a prokaryotic phylogeny. This approach will help decide the extent to which we can say that there is a prokaryotic phylogeny and where in the phylogeny a cohesive genomic signal exists.  相似文献   

9.
Klebsiella pneumoniae can use nitrate and nitrite as sole nitrogen sources during aerobic growth. Assimilatory nitrate and nitrite reductases convert nitrate through nitrite to ammonium. We report here the molecular cloning of the nasA and nasB genes, which encode assimilatory nitrate and nitrite reductase, respectively. These genes are tightly linked and probably form a nasBA operon. In vivo protein expression and DNA sequence analysis revealed that the nasA and nasB genes encode 92- and 104-kDa proteins, respectively. The NASA polypeptide is homologous to other prokaryotic molybdoenzymes, and the NASB polypeptide is homologous to eukaryotic and prokaryotic NADH-nitrite reductases. The narL gene product positively regulates expression of the structural genes for respiratory nitrate reductase, narGHJI. Surprisingly, we found that the nasBA operon is tightly linked to the narL-narGHJI region in K. pneumoniae, even though the nitrate assimilatory and respiratory enzymes serve different physiological functions.  相似文献   

10.
Recent studies have shown evidence for the coevolution of functionally-related genes. This coevolution is a result of constraints to maintain functional relationships between interacting proteins. The studies have focused on the correlation in gene tree branch lengths of proteins that are directly interacting with each other. We here hypothesize that the correlation in branch lengths is not limited only to proteins that directly interact, but also to proteins that operate within the same pathway. Using generalized linear models as a basis of identifying correlation, we attempted to predict the gene ontology (GO) terms of a gene based on its gene tree branch lengths. We applied our method to a dataset consisting of proteins from ten prokaryotic species. We found that the degree of accuracy to which we could predict the function of the proteins from their gene tree varied substantially with different GO terms. In particular, our model could accurately predict genes involved in translation and certain ribosomal activities with the area of the receiver-operator curve of up to 92%. Further analysis showed that the similarity between the trees of genes labeled with similar GO terms was not limited to genes that physically interacted, but also extended to genes functioning within the same pathway. We discuss the relevance of our findings as it relates to the use of phylogenetic methods in comparative genomics.  相似文献   

11.
The ionotropic glutamate receptor (iGluR) gene family has been widely studied in animals and is determined to be important in excitatory neurotransmission and other neuronal processes. We have previously identified ionotropic glutamate receptor-like genes (GLRs) in Arabidopsis thaliana, an organism that lacks a nervous system. Upon the completion of the Arabidopsis genome sequencing project, a large family of GLR genes has been uncovered. A preliminary phylogenetic analysis divides the AtGLR gene family into three clades and is used as the basis for the recently established nomenclature for the AtGLR gene family. We performed a phylogenetic analysis with extensive annotations of the iGluR gene family, which includes all 20 Arabidopsis GLR genes, the entire iGluR family from rat (except NR3), and two prokaryotic iGluRs, Synechocystis GluR0 and Anabaena GluR. Our analysis supports the division of the AtGLR gene family into three clades and identifies potential functionally important amino acid residues that are conserved in both prokaryotic and eukaryotic iGluRs as well as those that are only conserved in AtGLRs. To begin to investigate whether the three AtGLR clades represent different functional classes, we performed the first comprehensive mRNA expression analysis of the entire AtGLR gene family. On the basis of RT-PCR, all AtGLRs are expressed genes. The three AtGLR clades do not show distinct clade-specific organ expression patterns. All 20 AtGLR genes are expressed in the root. Among them, five of the nine clade-II genes are root-specific in 8-week-old Arabidopsis plants.  相似文献   

12.
Two-component systems including histidine protein kinases represent the primary signal transduction paradigm in prokaryotic organisms. To understand how these systems adapt to allow organisms to detect niche-specific signals, we analyzed the phylogenetic distribution of nearly 5,000 histidine protein kinases from 207 sequenced prokaryotic genomes. We found that many genomes carry a large repertoire of recently evolved signaling genes, which may reflect selective pressure to adapt to new environmental conditions. Both lineage-specific gene family expansion and horizontal gene transfer play major roles in the introduction of new histidine kinases into genomes; however, there are differences in how these two evolutionary forces act. Genes imported via horizontal transfer are more likely to retain their original functionality as inferred from a similar complement of signaling domains, while gene family expansion accompanied by domain shuffling appears to be a major source of novel genetic diversity. Family expansion is the dominant source of new histidine kinase genes in the genomes most enriched in signaling proteins, and detailed analysis reveals that divergence in domain structure and changes in expression patterns are hallmarks of recent expansions. Finally, while these two modes of gene acquisition are widespread across bacterial taxa, there are clear species-specific preferences for which mode is used.  相似文献   

13.
The gene neighborhood in prokaryotic genomes has been effectively utilized in inferring co-functional networks in various organisms. Previously, such genomic context information has been sought among completely assembled prokaryotic genomes. Here, we present a method to infer functional gene networks according to the gene neighborhood in metagenome contigs, which are incompletely assembled genomic fragments. Given that the amount of metagenome sequence data has now surpassed that of completely assembled prokaryotic genomes in the public domain, we expect benefits of inferring networks by the metagenome-based gene neighborhood. We generated co-functional networks for diverse taxonomical species using metagenomics contigs derived from the human microbiome and the ocean microbiome. We found that the networks based on the metagenome gene neighborhood outperformed those based on 1748 completely assembled prokaryotic genomes. We also demonstrated that the metagenome-based gene neighborhood could predict genes related to virulence-associated phenotypes in a bacterial pathogen, indicating that metagenome-based functional links could be sufficiently predictive for some phenotypes of medical importance. Owing to the exponential growth of metagenome sequence data in public repositories, metagenome-based inference of co-functional networks will facilitate understanding of gene functions and pathways in diverse species.  相似文献   

14.
Analysis of 16S rRNA gene sequences has become the primary method for determining prokaryotic phylogeny. Phylogeny is currently the basis for prokaryotic systematics. Therefore, the validity of 16S rRNA gene-based phylogenetic analyses is of fundamental importance for prokaryotic systematics. Discrepancies between 16S rRNA gene analyses and DNA-DNA hybridization and phenotypic analyses have been noted in the genus Helicobacter. To clarify these discrepancies, we sequenced the 23S rRNA genes for 55 helicobacter strains representing 41 taxa (>2,700 bases per sequence). Phylogenetic-tree construction using neighbor-joining, parsimony, and maximum likelihood methods for 23S rRNA gene sequence data yielded stable trees which were consistent with other phenotypic and genotypic methods. The 16S rRNA gene sequence-derived trees were discordant with the 23S rRNA gene trees and other data. Discrepant 16S rRNA gene sequence data for the helicobacters are consistent with the horizontal transfer of 16S rRNA gene fragments and the creation of mosaic molecules with loss of phylogenetic information. These results suggest that taxonomic decisions must be supported by other phylogenetically informative macromolecules, such as the 23S rRNA gene, when 16S rRNA gene-derived phylogeny is discordant with other credible phenotypic and genotypic methods. This study found Wolinella succinogenes to branch with the unsheathed-flagellum cluster of helicobacters by 23S rRNA gene analyses and whole-genome comparisons. This study also found intervening sequences (IVSs) in the 23S rRNA genes of strains of 12 Helicobacter species. IVSs were found in helices 10, 25, and 45, as well as between helices 31' and 27'. Simultaneous insertion of IVSs at three sites was found in H. mesocricetorum.  相似文献   

15.
16.
We compare the annotation of three complete genomes using theab initio methods of gene identification GeneScan and GLIMMER. The annotation given in GenBank, the standard against which these are compared, has been made using GeneMark. We find a number of novel genes which are predicted by both methods used here, as well as a number of genes that are predicted by GeneMark, but are not identified by either of the nonconsensus methods that we have used. The three organisms studied here are all prokaryotic species with fairly compact genomes. The Fourier measure forms the basis for an efficient non-consensus method for gene prediction, and the algorithm GeneScan exploits this measure. We have bench-marked this program as well as GLIMMER using 3 complete prokaryotic genomes. An effort has also been made to study the limitations of these techniques for complete genome analysis. GeneScan and GLIMMER are of comparable accuracy insofar as gene-identification is concerned, with sensitivities and specificities typically greater than 0.9. The number of false predictions (both positive and negative) is higher for GeneScan as compared to GLIMMER, but in a significant number of cases, similar results are provided by the two techniques. This suggests that there could be some as-yet unidentified additional genes in these three genomes, and also that some of the putative identifications made hitherto might require re-evaluation. All these cases are discussed in detail.  相似文献   

17.
MicroRNAs are small noncoding RNAs that regulate genes post-transciptionally by binding and degrading target eukaryotic mRNAs. We use a quantitative model to study gene regulation by inhibitory microRNAs and compare it to gene regulation by prokaryotic small non-coding RNAs (sRNAs). Our model uses a combination of analytic techniques as well as computational simulations to calculate the mean-expression and noise profiles of genes regulated by both microRNAs and sRNAs. We find that despite very different molecular machinery and modes of action (catalytic vs stoichiometric), the mean expression levels and noise profiles of microRNA-regulated genes are almost identical to genes regulated by prokaryotic sRNAs. This behavior is extremely robust and persists across a wide range of biologically relevant parameters. We extend our model to study crosstalk between multiple mRNAs that are regulated by a single microRNA and show that noise is a sensitive measure of microRNA-mediated interaction between mRNAs. We conclude by discussing possible experimental strategies for uncovering the microRNA-mRNA interactions and testing the competing endogenous RNA (ceRNA) hypothesis.  相似文献   

18.
The DJ-1 gene is extensively studied because of its involvement in familial Parkinson disease. DJ-1 belongs to a complex superfamily of genes that includes both prokaryotic and eukaryotic representatives. We determine that many prokaryotic groups, such as proteobacteria, cyanobacteria, spirochaetes, firmicutes, or fusobacteria, have genes, often incorrectly called "Thij," that are very close relatives of DJ-1, to the point that they cannot be clearly separated from the eukaryotic DJ-1 genes by phylogenetic analyses of their sequences. In addition, and contrary to a previous study that suggested that DJ-1 genes were animal specific, we show that DJ-1 genes are found in at least 5 of the 6 main eukaryotic groups: opisthokonta (both animals and fungi), plantae, chromalveolata, excavata, and amoebozoa. Our results thus provide strong evidence for DJ-1 genes originating before the origin of eukaryotes. Interestingly, we found that some fungal species, among them the model yeast Schizosaccharomyces pombe, have DJ-1-like genes, most likely orthologous to the animal genes. This finding opens new ways for the analysis of the functions of this group of genes.  相似文献   

19.

Background  

Horizontal gene transfer, also called lateral gene transfer, frequently occurs among prokaryotic organisms, and is considered an important force in their evolution. However, there are relatively few reports of transfer to or from fungi, with some notable exceptions in the acquisition of prokaryotic genes. Some fungal species have been found to contain sequences resembling those of bacterial genes, and with such sequences absent in other fungal species, this has been interpreted as horizontal gene transfer. Similarly, a few fungi have been found to contain genes absent in close relatives but present in more distantly related taxa, and horizontal gene transfer has been invoked as a parsimonious explanation. There is a paucity of direct experimental evidence demonstrating the occurrence of horizontal gene transfer in fungi.  相似文献   

20.
In this report, we identify the human DL-methylmalonyl-CoA racemase gene by analyzing prokaryotic gene arrangements and extrapolating the information obtained to human genes by homology searches. Sequence similarity searches were used to identify two groups of homologues that were frequently arranged with prokaryotic methylmalonyl-CoA mutase genes, and that were of unknown function. Both gene groups had homologues in the human genome. Because methylmalonyl-CoA mutases are involved in the metabolism of propionyl-CoA, we inferred that conserved neighbors of methylmalonyl-CoA mutase genes and their human homologues were also involved in this process. Subsequent biochemical studies confirmed this inference by showing that the prokaryotic gene PH0272 and its human homologue both encode DL-methylmalonyl-CoA racemases. To our knowledge this is the first report in which the function of a eukaryotic gene was determined based on the analysis of prokaryotic gene arrangements. Importantly, such analyses are rapid and may be generally applicable for the identification of human genes that lack homologues of known function or that have been misidentified on the basis of sequence similarity searches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号