首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
It is well-known that functionally related genes occur in a physically clustered form, especially operons in bacteria. By leveraging on this fact, there has recently been an interesting problem formulation known as gene team model, which searches for a set of genes that co-occur in a pair of closely related genomes. However, many gene teams, even experimentally verified operons, frequently scatter within other genomes. Thus, the gene team model should be refined to reflect this observation. In this paper, we generalized the gene team model, that looks for gene clusters in a physically clustered form, to multiple genome cases with relaxed constraints. We propose a novel hybrid pattern model that combines the set and the sequential pattern models. Our model searches for gene clusters with and/or without physical proximity constraint. This model is implemented and tested with 97 genomes (120 replicons). The result was analyzed to show the usefulness of our model. We also compared the result from our hybrid model to those from the traditional gene team model. We also show that predicted gene teams can be used for various genome analysis: operon prediction, phylogenetic analysis of organisms, contextual sequence analysis and genome annotation. Our program is fast enough to provide a service on the web at http://platcom.informatics.indiana.edu/platcom/. Users can select any combination of 97 genomes to predict gene teams.  相似文献   

2.
A gene team is a set of genes that appear in two or more species, possibly in a different order yet with the distance between adjacent genes in the team for each chromosome always no more than a certain threshold δ. A gene team tree is a succinct way to represent all gene teams for every possible value of δ. In this paper, improved algorithms are presented for the problem of finding the gene teams of two chromosomes and the problem of constructing a gene team tree of two chromosomes. For the problem of finding gene teams, Beal et al. had an O(n lg2 n)-time algorithm. Our improved algorithm requires O(n lg t) time, where t ≤ n is the number of gene teams. For the problem of constructing a gene team tree, Zhang and Leong had an O(n lg2 n)-time algorithm. Our improved algorithm requires O(n lg n lglg n) time. Similar to Beal et al.'s gene team algorithm and Zhang and Leong's gene team tree algorithm, our improved algorithms can be extended to k chromosomes with the time complexities increased only by a factor of k.  相似文献   

3.
Identifying conserved gene clusters is an important step toward understanding the evolution of genomes and predicting the functions of genes. A famous model to capture the essential biological features of a conserved gene cluster is called the gene-team model. The problem of finding the gene teams of two general sequences is the focus of this paper. For this problem, He and Goldwasser had an efficient algorithm that requires O(mn) time using O(m + n) working space, where m and n are, respectively, the numbers of genes in the two given sequences. In this paper, a new efficient algorithm is presented. Assume m ≤ n. Let C = Σ(α)(∈)(Σ) o(1)(α)o(2)(α), where Σ is the set of distinct genes, and o(1)(α) and o(2)(α) are, respectively, the numbers of copies of α in the two given sequences. Our new algorithm requires O(min{C lg n, mn}) time using O(m + n) working space. As compared with He and Goldwasser's algorithm, our new algorithm is more practical, as C is likely to be much smaller than mn in practice. In addition, our new algorithm is output sensitive. Its running time is O(lg n) times the size of the output. Moreover, our new algorithm can be efficiently extended to find the gene teams of k general sequences in O(k C lg (n(1)n(2). . .n(k)) time, where n(i) is the number of genes in the ith input sequence.  相似文献   

4.
5.
We previously reported two graph algorithms for analysis of genomic information: a graph comparison algorithm to detect locally similar regions called correlated clusters and an algorithm to find a graph feature called P-quasi complete linkage. Based on these algorithms we have developed an automatic procedure to detect conserved gene clusters and align orthologous gene orders in multiple genomes. In the first step, the graph comparison is applied to pairwise genome comparisons, where the genome is considered as a one-dimensionally connected graph with genes as its nodes, and correlated clusters of genes that share sequence similarities are identified. In the next step, the P-quasi complete linkage analysis is applied to grouping of related clusters and conserved gene clusters in multiple genomes are identified. In the last step, orthologous relations of genes are established among each conserved cluster. We analyzed 17 completely sequenced microbial genomes and obtained 2313 clusters when the completeness parameter P was 40%. About one quarter contained at least two genes that appeared in the metabolic and regulatory pathways in the KEGG database. This collection of conserved gene clusters is used to refine and augment ortholog group tables in KEGG and also to define ortholog identifiers as an extension of EC numbers.  相似文献   

6.
7.
Detecting uber-operons in prokaryotic genomes   总被引:3,自引:1,他引:3       下载免费PDF全文
Che D  Li G  Mao F  Wu H  Xu Y 《Nucleic acids research》2006,34(8):2418-2427
  相似文献   

8.
9.
10.
Although it is well known that there is no long range colinearity in gene order in bacterial genomes, it is thought that there are several regions that are under strong structural constraints during evolution, in which gene order is extremely conserved. One such region is the str locus, containing the S10-spc-alpha operons. These operons contain genes coding for ribosomal proteins and for a number of housekeeping genes. We compared the organisation of these gene clusters in 111 sequenced prokaryotic genomes (99 bacterial and 12 archaeal genomes). We also compared the organisation to the phylogeny based on 16S ribosomal RNA gene sequences and the sequences of the ribosomal proteins L22, L16 and S14. Our data indicate that there is much variation in gene order and content in these gene clusters, both in bacterial as well as in archaeal genomes. Our data indicate that differential gene loss has occurred on multiple occasions during evolution. We also noted several discrepancies between phylogenetic trees based on 16S rRNA gene sequences and sequences of ribosomal proteins L16, L22 and S14, suggesting that horizontal gene transfer did play a significant role in the evolution of the S10-spc-alpha gene clusters.  相似文献   

11.
12.
Prediction of operons in microbial genomes   总被引:28,自引:7,他引:21       下载免费PDF全文
  相似文献   

13.
14.
Analysis of evolution of paralogous genes in a genome is central to our understanding of genome evolution. Comparison of closely related bacterial genomes, which has provided clues as to how genome sequences evolve under natural conditions, would help in such an analysis. With species Staphylococcus aureus, whole-genome sequences have been decoded for seven strains. We compared their DNA sequences to detect large genome polymorphisms and to deduce mechanisms of genome rearrangements that have formed each of them. We first compared strains N315 and Mu50, which make one of the most closely related strain pairs, at the single-nucleotide resolution to catalogue all the middle-sized (more than 10 bp) to large genome polymorphisms such as indels and substitutions. These polymorphisms include two paralogous gene sets, one in a tandem paralogue gene cluster for toxins in a genomic island and the other in a ribosomal RNA operon. We also focused on two other tandem paralogue gene clusters and type I restriction-modification (RM) genes on the genomic islands. Then we reconstructed rearrangement events responsible for these polymorphisms, in the paralogous genes and the others, with reference to the other five genomes. For the tandem paralogue gene clusters, we were able to infer sequences for homologous recombination generating the change in the repeat number. These sequences were conserved among the repeated paralogous units likely because of their functional importance. The sequence specificity (S) subunit of type I RM systems showed recombination, likely at the homology of a conserved region, between the two variable regions for sequence specificity. We also noticed novel alleles in the ribosomal RNA operons and suggested a role for illegitimate recombination in their formation. These results revealed importance of recombination involving long conserved sequence in the evolution of paralogous genes in the genome.  相似文献   

15.
The list of species whose complete DNA sequence have been read is growing steadily, and it is believed that comparative genomics is in its early days. Permutations patterns (groups of genes in some "close" proximity) on gene sequences of genomes across species is being studied under different models, to cope with this explosion of data. The challenge is to (intelligently and efficiently) analyze the genomes in the context of other genomes. In this paper, we present a generalized model that uses three notions, gapped permutation patterns (with gap g), genome clusters, via quorum, K>1, parameter, and, possible multiplicity in the patterns. The task is to automatically discover all permutation patterns (with possible multiplicity), that occur with gap g in at least K of the given m genomes. We present (log mN (I) + /Sigma/log/Sigma/N (O)) time algorithm where m is the number of sequences, each defined on Sigma, N (I) is the size of the input and N (O) is the size of the maximal gene clusters that appear in at least K of the m genomes.  相似文献   

16.
Gene arrangement into operons varies between bacterial species. Genes in a given system can be on one operon in some organisms and on several operons in other organisms. Existing theories explain why genes that work together should be on the same operon, since this allows for advantageous lateral gene transfer and accurate stoichiometry. But what causes the frequent separation into multiple operons of co-regulated genes that act together in a pathway? Here we suggest that separation is due to benefits made possible by differential regulation of each operon. We present a simple mathematical model for the optimal distribution of genes into operons based on a balance of the cost of operons and the benefit of regulation that provides 'just-when-needed' temporal order. The analysis predicts that genes are arranged such that genes on the same operon do not skip functional steps in the pathway. This prediction is supported by genomic data from 137 bacterial genomes. Our work suggests that gene arrangement is not only the result of random historical drift, genome re-arrangement and gene transfer, but has elements that are solutions of an evolutionary optimization problem. Thus gene functional order may be inferred by analyzing the operon structure across different genomes.  相似文献   

17.
18.
The organization of ribosomal proteins in 16 prokaryotic genomes was studied as an example of comparative genome analyses of gene systems. Hypothetical ribosomal protein-containing operons were constructed. These operons also contained putative genes and other non-ribosomal genes. The correspondences among these genes across different organisms were clarified by sequence homology computations. In this way a cross tabulation of 70 ribosomal proteins genes was constructed. On average, these were organized into 9-14 operons in each genome. There were also 25 non-ribosomal or putative genes in these mainly ribosomal protein operons. Hence the table contains 95 genes in total. It was found that: (i) the conservation of the block of about 20 r-proteins in the L3 and L4 operons across almost the entire eubacteria and ar-chaebacteria is remarkable; (ii) some operons only belong to eubacteria or archaebacte-ria; (iii) although the ribosomal protein operons are highly conserved within domain, there are fine variat  相似文献   

19.
20.
Summary The organization of the 5S genes in the genome of Tetrahymena thermophila was examined in various strains, with germinal ageing, and the 5S gene clusters were mapped to the MIC chromosomes. When MIC or MAC DNA is cut with the restriction enzyme EcoRI, electrophoresed, blotted, and probed with a 5S rDNA probe, the banding patterns represent the clusters of the 5S rRNA genes as well as flanking regions. The use of long gels and 60 h of electrophoresis at 10 mA permitted resolution of some 30–35 5S gene clusters on fragments ranging in size from 30-2 kb (bottom of gel). The majority of the 5S gene clusters were found in both MIC and MAC genomes, a few being MIC limited and a few MAC limited. The relative copy number of 5S genes in each cluster was determined by integrating densitometric tracings made from autoradiograms. The total number of copies in the MAC was found to be 33% greater than in the MIC. When different inbred strains were examined, the majority of the 5S gene clusters were found to be conserved, with a few strain-specific clusters observed. Nine nullisomic strains missing both copies of one or more MIC chromosomes were used to map the 5S gene clusters. The clusters were distributed non-randomly to four of the five MIC chromosomes, with 17 of them localized to chromosome 1. A deletion map of chromosome 1 was constructed using various deletion strains. Some of these deletion strains included B strain clones which had been in continuous culture for 15 years. Losses of 5S gene clusters in these ageing MIC could be attributed to deletions of particular chromosomes. The chromosomal distribution of the 5S gene clusters in Tetrahymena is unlike that found for the well-studied eukaryotes, Drosophila and Xenopus.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号