首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Assignment of orthologous genes via genome rearrangement   总被引:1,自引:0,他引:1  
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics. Existing methods that assign orthologs based on the similarity between DNA or protein sequences may make erroneous assignments when sequence similarity does not clearly delineate the evolutionary relationship among genes of the same families. In this paper, we present a new approach to ortholog assignment that takes into account both sequence similarity and evolutionary events at a genome level, where orthologous genes are assumed to correspond to each other in the most parsimonious evolving scenario under genome rearrangement. First, the problem is formulated as that of computing the signed reversal distance with duplicates between the two genomes of interest. Then, the problem is decomposed into two new optimization problems, called minimum common partition and maximum cycle decomposition, for which efficient heuristic algorithms are given. Following this approach, we have implemented a high-throughput system for assigning orthologs on a genome scale, called SOAR, and tested it on both simulated data and real genome sequence data. Compared to a recent ortholog assignment method based entirely on homology search (called INPARANOID), SOAR shows a marginally better performance in terms of sensitivity on the real data set because it is able to identify several correct orthologous pairs that are missed by INPARANOID. The simulation results demonstrate that SOAR, in general, performs better than the iterated exemplar algorithm in terms of computing the reversal distance and assigning correct orthologs.  相似文献   

2.
The identification of orthologs to a set of known genes is often the starting point for evolutionary studies focused on gene families of interest. To date, the existing orthology detection tools (COG, InParanoid, OrthoMCL, etc.) are aimed at genome-wide ortholog identification and lack flexibility for the purposes of case studies. We developed a program OrthoFocus, which employs an extended reciprocal best hit approach to quickly search for orthologs in a pair of genomes. A group of paralogs from the input genome is used as the start for the forward search and the criterion for the reverse search, which allows handling many-to-one and many-to-many relationships. By pairwise comparison of genomes with the input species genome, OrthoFocus enables quick identification of orthologs in multiple genomes and generates a multiple alignment of orthologs so that it can further be used in phylogenetic analysis. The program is available at http://www.lipidomics.ru/.  相似文献   

3.
4.
5.
6.
Changes in the patterns produced by annealing restriction endonuclease digests of bacterial genomes with probe deoxyribonucleic acids (DNAs) containing small portions of a bacterial genome provide sensitive indicator of the degree of nucleotide sequence relatedness that exists in localized regions of the genomes of closely related bacteria. We have used five probe DNAs to explore the relatedness of parts of the genomes of six laboratory Escherichi coli strains. A range in in the amount of variability in the positions of restriction enzyme cleavage sites in the selected portions of the genomes was found. Portions of the genome that are believed to be inacative were more variable than portions that contained functional genes: the sites in and near regions of homology to phage lambda DNA in the genome showed the greatest variability. These regions probably represent remnants of cryptic prophages. Variability was assessed pairwise among four of the E. coli strains and ranged from 5 to > 25% base pair substitutions in the lambda-related regions. In contrast, the endonuclease cleavage sites in the trp, tna, lac, thy regions, and one other as-yet-unidentified segment of the genome were more highly conserved. It seems likely that these sites lie in genetic locations that are subject to functional constraints.  相似文献   

7.
Pair rule gene orthologs in spider segmentation   总被引:4,自引:0,他引:4  
The activation of pair rule genes is the first indication of the metameric organization of the Drosophila embryo and thus forms a key step in the segmentation process. There are two classes of pair rule genes in Drosophila: the primary pair rule genes that are directly activated by the maternal and gap genes and the secondary pair rule genes that rely on input from the primary pair rule genes. Here we analyze orthologs of Drosophila primary and secondary pair rule orthologs in the spider Cupiennius salei. The expression patterns of the spider pair rule gene orthologs can be subdivided in three groups: even-skipped and runt-1 expression is in stripes that start at the posterior end of the growth zone and their expression ends before the stripes reach the anterior end of the growth zone, while hairy and pairberry-3 stripes also start at the posterior end, but do not cease in the anterior growth zone. Stripes of odd-paired, odd-skipped-related-1, and sloppy paired are only found in the anterior portion of the growth zone. The various genes thus seem to be active during different phases of segment specification. It is notable that the spider orthologs of the Drosophila primary pair rule genes are active more posterior in the growth zone and thus during earlier phases of segment specification than most orthologs of Drosophila secondary pair rule genes, indicating that parts of the hierarchy might be conserved between flies and spiders. The spider ortholog of the Drosophila pair rule gene fushi tarazu is not expressed in the growth zone, but is expressed in a Hox-like fashion. The segmentation function of fushi tarazu thus appears to be a newly acquired role of the gene in the lineage of the mandibulate arthropods.  相似文献   

8.
Guo X  Bao J  Fan L 《FEBS letters》2007,581(5):1015-1021
Two gene classes characterized by high and low GC content have been found in rice and other cereals, but not dicot genomes. We used paralogs with high and low GC contents in rice and found: (a) a greater increase in GC content at exonic fourfold-redundant sites than at flanking introns; (b) with reference to their orthologs in Arabidopsis, most substitution sites between the two kinds of paralogs are found at 2- and 4-degenerate sites with a T-->C mode, while A-->C and A-->G play major roles at 0-degenerate sites; and (c) high-GC genes have greater bias and codon usage is skewed toward codons that are preferred in highly expressed genes. We believe this is strong evidence for selectively driven codon usage in rice. Another cereal, maize, also showed the same trend as in rice. This represents a potential evolutionary process for the origin of genes with a high GC content in rice and other cereals.  相似文献   

9.
ABSTRACT: BACKGROUND: A new strain of Geobacter sulfurreducens, strain KN400, produces more electrical current in microbial fuel cells and reduces insoluble Fe(III) oxides much faster than the wildtype strain, PCA. The genome of KN400 was compared to wildtype with the goal of discovering how the network for extracellular electron transfer has changed and how these two strains evolved. RESULTS: Both genomes were re-annotated, resulting in 14 fewer genes (net) in the PCA genome; 28 fewer (net) in the KN400 genome; and ca. 400 gene start and stop sites moved. 96% of genes in KN400 had clear orthologs with conserved synteny in PCA. Most of the remaining genes were in regions of genomic mobility and were strain-specific or conserved in other Geobacteraceae, indicating that the changes occurred post-divergence. There were 27,270 single nucleotide polymorphisms (SNP) between the genomes. There was significant enrichment for SNP locations in non-coding or synonymous amino acid sites, indicating significant selective pressure since the divergence. 25% of orthologs had sequence differences, and this set was enriched in phosphorylation and ATP-dependent enzymes. Substantial sequence differences (at least 12 non-synonymous SNP/kb) were found in 3.6% of the orthologs, and this set was enriched in cytochromes and integral membrane proteins. Genes known to be involved in electron transport, those used in the metabolic cell model, and those that exhibit changes in expression during growth in microbial fuel cells were examined in detail. CONCLUSIONS: The improvement in external electron transfer in the KN400 strain does not appear to be due to novel gene acquisition, but rather to changes in the common metabolic network. The increase in electron transfer rate and yield in KN400 may be due to changes in carbon flux towards oxidation pathways and to changes in ATP metabolism, both of which indicate that the overall energy state of the cell may be different. The electrically conductive pili appear to be unchanged, but cytochrome folding, localization, and redox potentials may all be affected, which would alter the electrical connection between the cell and the substrate.  相似文献   

10.
A total of 37 complete genome sequences of bacteria, archaea, and eukaryotes were compared. The percentage of orthologous genes of each species contained within any of the other 36 genomes was established. In addition, the mean identity of the orthologs was calculated. Several conclusions result: (i) a greater absolute number of orthologs of a given species is found in larger species than in smaller ones; (ii) a greater percentage of the orthologous genes of smaller genomes is contained in other species than is the case for larger genomes, which corresponds to a larger proportion of essential genes; (iii) before species can be specifically related to one another in terms of gene content, it is first necessary to correct for the size of the genome; (iv) eukaryotes have a significantly smaller percentage of bacterial orthologs after correction for genome size, which is consistent with their placement in a separate domain; (v) the archaebacteria are specifically related to one another but are not significantly different in gene content from the bacteria as a whole; (vi) determination of the mean identity of all orthologs (involving hundreds of gene comparisons per genome pair) reduces the impact of errors in misidentification of orthologs and to misalignments, and thus it is far more reliable than single gene comparisons; (vii) however, there is a maximum amount of change in protein sequences of 37% mean identity, which limits the use of percentage sequence identity to the lower taxa, a result which should also be true for single gene comparisons of both proteins and rRNA; (viii) most of the species that appear to be specifically related based upon gene content also appear to be specifically related based upon the mean identity of orthologs; (ix) the genes of a majority of species considered in this study have diverged too much to allow the construction of all-encompassing evolutionary trees. However, we have shown that eight species of gram-negative bacteria, six species of gram-positive bacteria, and eight species of archaebacteria are specifically related in terms of gene content, mean identity of orthologs, or both.  相似文献   

11.
MOTIVATION: Genes with identical patterns of occurrence across the phyla tend to function together in the same protein complexes or participate in the same biochemical pathway. However, the requirement that the profiles be identical (i) severely restricts the number of functional links that can be established by such phylogenetic profiling; (ii) limits detection to very strong functional links, failing to capture relations between genes that are not in the same pathway, but nevertheless subserve a common function and (iii) misses relations between analogous genes. Here we present and apply a method for relaxing the restriction, based on the probability that a given arbitrary degree of similarity between two profiles would occur by chance, with no biological pressure. Function is then inferred at any desired level of confidence. RESULTS: We derive an expression for the probability distribution of a given number of chance co-occurrences of a pair of non-homologous orthologs across a set of genomes. The method is applied to 2905 clusters of orthologous genes (COGs) from 44 fully sequenced microbial genomes representing all three domains of life. Among the results are the following. (1) Of the 51 000 annotated intrapathway gene pairs, 8935 are linked at a level of significance of 0.01. This is over 30-fold greater than the 271 intrapathway pairs obtained at the same confidence level when identical profiles are used. (2) Of the 540 000 interpathway genes pairs, some 65 000 are linked at the 0.01 level of significance, some 12 standard deviations beyond the number expected by chance at this confidence level. We speculate that many of these links involve nearest-neighbor path, and discuss some examples. (3) The difference in the percentage of linked interpathway and intrapathway genes is highly significant, consistent with the intuitive expectation that genes in the same pathway are generally under greater selective pressure than those that are not. (4) The method appears to recover well metabolic networks. This is illustrated by the TCA cycle which is recovered as a highly connected, weighted edge network of 30 of its 31 COGs. (5) The fraction of pairs having a common pathway is a symmetric function of the Hamming distance between their profiles. This finding, that the functional correlation between profiles with near maximum Hamming distance is as large as between profiles with near zero Hamming distance, and as statistically significant, is plausibly explained if the former group represents analogous genes.  相似文献   

12.
We have identified conserved orthologs in completely sequenced genomes of double-strand DNA phages and arranged them into evolutionary families (phage orthologous groups [POGs]). Using this resource to analyze the collection of known phage genomes, we find that most orthologs are unique in their genomes (having no diverged duplicates [paralogs]), and while many proteins contain multiple domains, the evolutionary recombination of these domains does not appear to be a major factor in evolution of these orthologous families. The number of POGs has been rapidly increasing over the past decade, the percentage of genes in phage genomes that have orthologs in other phages has also been increasing, and the percentage of unknown "ORFans" is decreasing as more proteins find homologs and establish a family. Other properties of phage genomes have remained relatively stable over time, most notably the high fraction of genes that are never or only rarely observed in their cellular hosts. This suggests that despite the renowned ability of phages to transduce cellular genes, these cellular "hitchhiker" genes do not dominate the phage genomic landscape, and a large fraction of the genes in phage genomes maintain an evolutionary trajectory that is distinct from that of the host genes.  相似文献   

13.
Mitochondrial ribosomes contain bacterial-type proteins reflecting their endosymbiotic heritage, and a subset of these genes is retained within the mitochondrion in land plants. Variation in gene location is observed, however, because migration to the nucleus is still an ongoing evolutionary process in plants. To gain insights into adaptation events related to successful gene transfer, we have compiled data for bacterial-origin mitochondrial-type ribosomal protein genes from the completely sequenced Arabidopsis and rice genomes. Approximately 75% of such nuclear-located genes encode amino-terminal extensions relative to their Escherichia coli counterparts, and of that set, only about 30% have introns at (or near) the junction in support of an exon shuffling-type recruitment of upstream expression/targeting signals. We find that genes that were transferred to the nucleus early in eukaryotic evolution have, on average, about twofold higher density of introns within the core ribosomal protein sequences than do those that moved to the nucleus more recently. About 20% of such introns are at positions identical to those in human orthologs, consistent with their ancestral presence. Plant mitochondrial-type ribosomal protein genes have dispersed chromosomal locations in the nucleus, and about 20% of them are present in multiple unlinked copies. This study provides new insights into the evolutionary history of endosymbiotic bacterial-type genes that have been transferred from the mitochondrion to the nucleus.  相似文献   

14.
The identification of single copy (1-to-1) orthologs in any group of organisms is important for functional classification and phylogenetic studies. The Metazoa are no exception, but only recently has there been a wide-enough distribution of taxa with sufficiently high quality sequenced genomes to gain confidence in the wide-spread single copy status of a gene.Here, we present a phylogenetic approach for identifying overlooked single copy orthologs from multigene families and apply it to the Metazoa. Using 18 sequenced metazoan genomes of high quality we identified a robust set of 1,126 orthologous groups that have been retained in single copy since the last common ancestor of Metazoa. We found that the use of the phylogenetic procedure increased the number of single copy orthologs found by over a third more than standard taxon-count approaches. The orthologs represented a wide range of functional categories, expression profiles and levels of divergence.To demonstrate the value of our set of single copy orthologs, we used them to assess the completeness of 24 currently published metazoan genomes and 62 EST datasets. We found that the annotated genes in published genomes vary in coverage from 79% (Ciona intestinalis) to 99.8% (human) with an average of 92%, suggesting a value for the underlying error rate in genome annotation, and a strategy for identifying single copy orthologs in larger datasets. In contrast, the vast majority of EST datasets with no corresponding genome sequence available are largely under-sampled and probably do not accurately represent the actual genomic complement of the organisms from which they are derived.  相似文献   

15.
16.

Background

Model organisms have contributed substantially to our understanding of the etiology of human disease as well as having assisted with the development of new treatment modalities. The availability of the human, mouse and, most recently, the rat genome sequences now permit the comprehensive investigation of the rodent orthologs of genes associated with human disease. Here, we investigate whether human disease genes differ significantly from their rodent orthologs with respect to their overall levels of conservation and their rates of evolutionary change.

Results

Human disease genes are unevenly distributed among human chromosomes and are highly represented (99.5%) among human-rodent ortholog sets. Differences are revealed in evolutionary conservation and selection between different categories of human disease genes. Although selection appears not to have greatly discriminated between disease and non-disease genes, synonymous substitution rates are significantly higher for disease genes. In neurological and malformation syndrome disease systems, associated genes have evolved slowly whereas genes of the immune, hematological and pulmonary disease systems have changed more rapidly. Amino-acid substitutions associated with human inherited disease occur at sites that are more highly conserved than the average; nevertheless, 15 substituting amino acids associated with human disease were identified as wild-type amino acids in the rat. Rodent orthologs of human trinucleotide repeat-expansion disease genes were found to contain substantially fewer of such repeats. Six human genes that share the same characteristics as triplet repeat-expansion disease-associated genes were identified; although four of these genes are expressed in the brain, none is currently known to be associated with disease.

Conclusions

Most human disease genes have been retained in rodent genomes. Synonymous nucleotide substitutions occur at a higher rate in disease genes, a finding that may reflect increased mutation rates in the chromosomal regions in which disease genes are found. Rodent orthologs associated with neurological function exhibit the greatest evolutionary conservation; this suggests that rodent models of human neurological disease are likely to most faithfully represent human disease processes. However, with regard to neurological triplet repeat expansion-associated human disease genes, the contraction, relative to human, of rodent trinucleotide repeats suggests that rodent loci may not achieve a 'critical repeat threshold' necessary to undergo spontaneous pathological repeat expansions. The identification of six genes in this study that have multiple characteristics associated with repeat expansion-disease genes raises the possibility that not all human loci capable of facilitating neurological disease by repeat expansion have as yet been identified.  相似文献   

17.
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics, since many computational methods for solving various biological problems critically rely on bona fide orthologs as input. While it is usually done using sequence similarity search, we recently proposed a new combinatorial approach that combines sequence similarity and genome rearrangement. This paper continues the development of the approach and unites genome rearrangement events and (post-speciation) duplication events in a single framework under the parsimony principle. In this framework, orthologous genes are assumed to correspond to each other in the most parsimonious evolutionary scenario involving both genome rearrangement and (post-speciation) gene duplication. Besides several original algorithmic contributions, the enhanced method allows for the detection of inparalogs. Following this approach, we have implemented a high-throughput system for ortholog assignment on a genome scale, called MSOAR, and applied it to human and mouse genomes. As the result will show, MSOAR is able to find 99 more true orthologs than the INPARANOID program did. In comparison to the iterated exemplar algorithm on simulated data, MSOAR performed favorably in terms of assignment accuracy. We also validated our predicted main ortholog pairs between human and mouse using public ortholog assignment datasets, synteny information, and gene function classification. These test results indicate that our approach is very promising for genome-wide ortholog assignment. Supplemental material and MSOAR program are available at http://msoar.cs.ucr.edu.  相似文献   

18.
An in silico comparative genomics approach was used to identify putative orthologs to genetically mapped genes from the mosquito, Aedes aegypti, in the Drosophila melanogaster and Anopheles gambiae genome databases. Comparative chromosome positions of 73 D. melanogaster orthologs indicated significant deviations from a random distribution across each of the five A. aegypti chromosomal regions, suggesting that some ancestral chromosome elements have been conserved. However, the two genomes also reflect extensive reshuffling within and between chromosomal regions. Comparative chromosome positions of A. gambiae orthologs indicate unequivocally that A. aegypti chromosome regions share extensive homology to the five A. gambiae chromosome arms. Whole-arm or near-whole-arm homology was contradicted with only two genes among the 75 A. aegypti genes for which orthologs to A. gambiae were identified. The two genomes contain large conserved chromosome segments that generally correspond to break/fusion events and a reciprocal translocation with extensive paracentric inversions evident within. Only very tightly linked genes are likely to retain conserved linear orders within chromosome segments. The D. melanogaster and A. gambiae genome databases therefore offer limited potential for comparative positional gene determinations among even closely related dipterans, indicating the necessity for additional genome sequencing projects with other dipteran species.  相似文献   

19.
An automated comparative analysis of 17 complete microbial genomes   总被引:3,自引:0,他引:3  
MOTIVATION: As sequenced genomes become larger and sequencing becomes faster, there is a need to develop accurate automated genome comparison techniques and databases to facilitate derivation of genome functionality; identification of enzymes, putative operons and metabolic pathways; and to derive phylogenetic classification of microbes. RESULTS: This paper extends an automated pair-wise genome comparison technique (Bansal et al., Math. Model. Sci. Comput., 9, 1-23, 1998, Bansal and Bork, in First International Workshop of Declarative Languages, Springer, pp. 275-289, 1999) used to identify orthologs and gene groups to derive orthologous genes in a group of genomes and to identify genes with conserved functionality. Seventeen microbial genomes archived at ftp://ncbi.nlm.nih.gov/genbank/genomes have been compared using the automated technique. Data related to orthologs, gene groups, gene duplication, gene fusion, orthologs with conserved functionality, and genes specifically orthologous to Escherichia coli and pathogens has been presented and analyzed. AVAILABILITY: A prototype database is available at ftp://www.mcs.kent.edu/arvind/intellibio / orthos.html. The software is free for academic research under an academic license. The detailed database for every microbial genome in NCBI is commercially available through intellibio software and consultancy corporation (Web site: http://www.mcs.kent.edu/?rvind/intellibio . html). CONTACT: arvind@mcs.kent.edu.  相似文献   

20.
Roundup: a multi-genome repository of orthologs and evolutionary distances   总被引:1,自引:0,他引:1  
SUMMARY: We have created a tool for ortholog and phylogenetic profile retrieval called Roundup. Roundup is backed by a massive repository of orthologs and associated evolutionary distances that was built using the reciprocal smallest distance algorithm, an approach that has been shown to improve upon alternative approaches of ortholog detection, such as reciprocal blast. Presently, the Roundup repository contains all possible pair-wise comparisons for over 250 genomes, including 32 Eukaryotes, more than doubling the coverage of any similar resource. The orthologs are accessible through an intuitive web interface that allows searches by genome or gene identifier, presenting results as phylogenetic profiles together with gene and molecular function annotations. Results may be downloaded as phylogenetic matrices for subsequent analysis, including the construction of whole-genome phylogenies based on gene-content data. AVAILABILITY: http://rodeo.med.harvard.edu/tools/roundup.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号