首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 40 毫秒
1.
Li G  Ma Q  Mao X  Yin Y  Zhu X  Xu Y 《Nucleic acids research》2011,39(22):e150
Existing methods for orthologous gene mapping suffer from two general problems: (i) they are computationally too slow and their results are difficult to interpret for automated large-scale applications when based on phylogenetic analyses; or (ii) they are too prone to making mistakes in dealing with complex situations involving horizontal gene transfers and gene fusion due to the lack of a sound basis when based on sequence similarity information. We present a novel algorithm, Global Optimization Strategy (GOST), for orthologous gene mapping through combining sequence similarity and contextual (working partners) information, using a combinatorial optimization framework. Genome-scale applications of GOST show substantial improvements over the predictions by three popular sequence similarity-based orthology mapping programs. Our analysis indicates that our algorithm overcomes the intrinsic issues faced by sequence similarity-based methods, when orthology mapping involves gene fusions and horizontal gene transfers. Our program runs as efficiently as the most efficient sequence similarity-based algorithm in the public domain. GOST is freely downloadable at http://csbl.bmb.uga.edu/~maqin/GOST.  相似文献   

2.
3.
In the field of phylogenetics and comparative genomics, it is important to establish orthologous relationships when comparing homologous sequences. Due to the slight sequence dissimilarity between orthologs and paralogs, it is prone to regarding paralogs as orthologs. For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision. Depending on their algorithmic implementations, each of these methods sometimes has increased false negative or false positive rates. Here, we developed a novel algorithm for orthology detection that uses a distance method based on the phylogenetic criterion of minimum evolution. Our algorithm assumes that sets of sequences exhibiting orthologous relationships are evolutionarily less costly than sets that include one or more paralogous relationships. Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree. Unlike tree reconciliation, our algorithm appears free from the problem of incorrect topologies of species and gene trees. The reliability of the algorithm was tested in a comparative analysis with two other orthology detection methods using 95 manually curated KOG datasets and 21 experimentally verified EXProt datasets. Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.  相似文献   

4.
MOTIVATION: The determination of gene orthology is a prerequisite for mining and utilizing the rapidly increasing amount of sequence data for genome-scale phylogenetics and comparative genomic studies. Until now, most researchers use pairwise distance comparisons algorithms, such as BLAST, COG, RBH, RSD and INPARANOID, to determine gene orthology. In contrast, orthology determination within a character-based phylogenetic framework has not been utilized on a genomic scale owing to the lack of efficiency and automation. RESULTS: We have developed OrthologID, a Web application that automates the labor-intensive procedures of gene orthology determination within a character-based phylogenetic framework, thus making character-based orthology determination on a genomic scale possible. In addition to generating gene family trees and determining orthologous gene sets for complete genomes, OrthologID can also identify diagnostic characters that define each orthologous gene set, as well as diagnostic characters that are responsible for classifying query sequences from other genomes into specific orthology groups. The OrthologID database currently includes several complete plant genomes, including Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, as well as a unicellular outgroup, Chlamydomonas reinhardtii. To improve the general utility of OrthologID beyond plant species, we plan to expand our sequence database to include the fully sequenced genomes of prokaryotes and other non-plant eukaryotes. AVAILABILITY: http://nypg.bio.nyu.edu/orthologid/  相似文献   

5.
6.
MOTIVATION: Determining orthology relations among genes across multiple genomes is an important problem in the post-genomic era. Identifying orthologous genes can not only help predict functional annotations for newly sequenced or poorly characterized genomes, but can also help predict new protein-protein interactions. Unfortunately, determining orthology relation through computational methods is not straightforward due to the presence of paralogs. Traditional approaches have relied on pairwise sequence comparisons to construct graphs, which were then partitioned into putative clusters of orthologous groups. These methods do not attempt to preserve the non-transitivity and hierarchic nature of the orthology relation. RESULTS: We propose a new method, COCO-CL, for hierarchical clustering of homology relations and identification of orthologous groups of genes. Unlike previous approaches, which are based on pairwise sequence comparisons, our method explores the correlation of evolutionary histories of individual genes in a more global context. COCO-CL can be used as a semi-independent method to delineate the orthology/paralogy relation for a refined set of homologous proteins obtained using a less-conservative clustering approach, or as a refiner that removes putative out-paralogs from clusters computed using a more inclusive approach. We analyze our clustering results manually, with support from literature and functional annotations. Since our orthology determination procedure does not employ a species tree to infer duplication events, it can be used in situations when the species tree is unknown or uncertain. CONTACT: jothi@mail.nih.gov, przytyck@mail.nih.gov SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online.  相似文献   

7.

Background  

The transfer of functional annotations from model organism proteins to human proteins is one of the main applications of comparative genomics. Various methods are used to analyze cross-species orthologous relationships according to an operational definition of orthology. Often the definition of orthology is incorrectly interpreted as a prediction of proteins that are functionally equivalent across species, while in fact it only defines the existence of a common ancestor for a gene in different species. However, it has been demonstrated that orthologs often reveal significant functional similarity. Therefore, the quality of the orthology prediction is an important factor in the transfer of functional annotations (and other related information). To identify protein pairs with the highest possible functional similarity, it is important to qualify ortholog identification methods.  相似文献   

8.
Zhao Z  Thomas JH  Chen N  Sheps JA  Baillie DL 《Genetics》2007,175(3):1407-1418
ABC transporters constitute one of the largest gene families in all species. They are mostly involved in transport of substrates across membranes. We have previously demonstrated that the Caenorhabditis elegans ABC family shows poor one-to-one gene orthology with other distant model organisms. To address the evolution dynamics of this gene family among closely related species, we carried out a comparative analysis of the ABC family among the three nematode species C. elegans, C. briggsae, and C. remanei. In contrast to the previous observations, the majority of ABC genes in the three species were found in orthologous trios, including many tandemly duplicated ABC genes, indicating that the gene duplication took place before speciation. Species-specific expansions of ABC members are rare and mostly observed in subfamilies A and B. C. briggsae and C. remanei orthologous ABC genes tend to cluster on trees, with those of C. elegans as an outgroup, consistent with their proposed species phylogeny. Comparison of intron/exon structures of the highly conserved ABCE subfamily members also indicates a closer relationship between C. briggsae and C. remanei than between either of these species and C. elegans. A comparison between insect and mammalian species indicates lineage-specific duplications or deletions of ABC genes, while the family size remains relatively constant. Sites undergoing positive selection within subfamily D, which are implicated in very-long-chain fatty acid transport, were identified. The evolution of these sites might be driven by the changes in food source with time.  相似文献   

9.
10.
MOTIVATION: Phylogenomics integrates the vast amount of phylogenetic information contained in complete genome sequences, and is rapidly becoming the standard for reliably inferring species phylogenies. There are, however, fundamental differences between the ways in which phylogenomic approaches like gene content, superalignment, superdistance and supertree integrate the phylogenetic information from separate orthologous groups. Furthermore, they all depend on the method by which the orthologous groups are initially determined. Here, we systematically compare these four phylogenomic approaches, in parallel with three approaches for large-scale orthology determination: pairwise orthology, cluster orthology and tree-based orthology. RESULTS: Including various phylogenetic methods, we apply a total of 54 fully automated phylogenomic procedures to the fungi, the eukaryotic clade with the largest number of sequenced genomes, for which we retrieved a golden standard phylogeny from the literature. Phylogenomic trees based on gene content show, relative to the other methods, a bias in the tree topology that parallels convergence in lifestyle among the species compared, indicating convergence in gene content. CONCLUSIONS: Complete genomes are no guarantee for good or even consistent phylogenies. However, the large amounts of data in genomes enable us to carefully select the data most suitable for phylogenomic inference. In terms of performance, the superalignment approach, combined with restrictive orthology, is the most successful in recovering a fungal phylogeny that agrees with current taxonomic views, and allows us to obtain a high-resolution phylogeny. We provide solid support for what has grown to be a common practice in phylogenomics during its advance in recent years. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

11.
MOTIVATION: The complete sequencing of many genomes has made it possible to identify orthologous genes descending from a common ancestor. However, reconstruction of evolutionary history over long time periods faces many challenges due to gene duplications and losses. Identification of orthologous groups shared by multiple proteomes therefore becomes a clustering problem in which an optimal compromise between conflicting evidences needs to be found. RESULTS: Here we present a new proteome-scale analysis program called MultiParanoid that can automatically find orthology relationships between proteins in multiple proteomes. The software is an extension of the InParanoid program that identifies orthologs and inparalogs in pairwise proteome comparisons. MultiParanoid applies a clustering algorithm to merge multiple pairwise ortholog groups from InParanoid into multi-species ortholog groups. To avoid outparalogs in the same cluster, MultiParanoid only combines species that share the same last ancestor. To validate the clustering technique, we compared the results to a reference set obtained by manual phylogenetic analysis. We further compared the results to ortholog groups in KOGs and OrthoMCL, which revealed that MultiParanoid produces substantially fewer outparalogs than these resources. AVAILABILITY: MultiParanoid is a freely available standalone program that enables efficient orthology analysis much needed in the post-genomic era. A web-based service providing access to the original datasets, the resulting groups of orthologs, and the source code of the program can be found at http://multiparanoid.cgb.ki.se.  相似文献   

12.
13.
The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another.Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.  相似文献   

14.
Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL [Markov clustering algorithm]) is a tool that automates the process of identifying clusters of orthologous genes from precomputed phylogenetic trees and classifying gene families. It identifies orthology relationships between genes using the species overlap algorithm to infer taxonomic information from the gene tree topology, and then uses the MCL to identify orthology clusters and provide annotated gene families. Our benchmarking shows that this approach, when provided with accurate phylogenies, is able to identify manually curated orthogroups with very high precision and recall. Overall, Possvm automates the routine process of gene tree inspection and annotation in a highly interpretable manner, and provides reusable outputs and phylogeny-aware gene annotations that can be used to inform comparative genomics and gene family evolution analyses.  相似文献   

15.
The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.  相似文献   

16.

Background  

Orthology is one of the cornerstones of gene function prediction. Dividing the phylogenetic relations between genes into either orthologs or paralogs is however an oversimplification. Already in two-species gene-phylogenies, the complicated, non-transitive nature of phylogenetic relations results in inparalogs and outparalogs. For situations with more than two species we lack semantics to specifically describe the phylogenetic relations, let alone to exploit them. Published procedures to extract orthologous groups from phylogenetic trees do not allow identification of orthology at various levels of resolution, nor do they document the relations between the orthologous groups.  相似文献   

17.
Conant GC  Wolfe KH 《Genetics》2008,179(3):1681-1692
Identification of orthologous genes across species becomes challenging in the presence of a whole-genome duplication (WGD). We present a probabilistic method for identifying orthologs that considers all possible orthology/paralogy assignments for a set of genomes with a shared WGD (here five yeast species). This approach allows us to estimate how confident we can be in the orthology assignments in each genomic region. Two inferences produced by this model are indicative of purifying selection acting to prevent duplicate gene loss. First, our model suggests that there are significant differences (up to a factor of seven) in duplicate gene half-life. Second, we observe differences between the genes that the model infers to have been lost soon after WGD and those lost more recently. Gene losses soon after WGD appear uncorrelated with gene expression level and knockout fitness defect. However, later losses are biased toward genes whose paralogs have high expression and large knockout fitness defects, as well as showing biases toward certain functional groups such as ribosomal proteins. We suggest that while duplicate copies of some genes may be lost neutrally after WGD, another set of genes may be initially preserved in duplicate by natural selection for reasons including dosage.  相似文献   

18.
Knight RD  Shimeld SM 《Genome biology》2001,2(5):research0016.1-research00168
Background:Identification of orthologous relationships between genes from widely divergent taxa allows partial reconstruction of the gene complement of ancestral genomes. C2H2 zinc-finger genes are one of the largest and most complex gene superfamilies in metazoan genomes, with hundreds of members in the human genome. Here we analyze C2H2 zinc-finger genes from three taxa - Drosophila, Caenorhabditis elegans and human - from which near-complete genome sequence data are available.Results:Our analyses conclusively identify 39 families of genes, of which 38 can be defined as orthology groups in that they are descended from single ancestral genes in the common ancestor of Drosophila, C. elegans and humans.Conclusions:On the basis of current metazoan phylogeny, these 39 groups represent the minimum complement of C2H2 zinc-finger genes present in the genome of the bilaterian common ancestor.  相似文献   

19.
Shi G  Peng MC  Jiang T 《PloS one》2011,6(6):e20892
The identification of orthologous genes shared by multiple genomes plays an important role in evolutionary studies and gene functional analyses. Based on a recently developed accurate tool, called MSOAR 2.0, for ortholog assignment between a pair of closely related genomes based on genome rearrangement, we present a new system MultiMSOAR 2.0, to identify ortholog groups among multiple genomes in this paper. In the system, we construct gene families for all the genomes using sequence similarity search and clustering, run MSOAR 2.0 for all pairs of genomes to obtain the pairwise orthology relationship, and partition each gene family into a set of disjoint sets of orthologous genes (called super ortholog groups or SOGs) such that each SOG contains at most one gene from each genome. For each such SOG, we label the leaves of the species tree using 1 or 0 to indicate if the SOG contains a gene from the corresponding species or not. The resulting tree is called a tree of ortholog groups (or TOGs). We then label the internal nodes of each TOG based on the parsimony principle and some biological constraints. Ortholog groups are finally identified from each fully labeled TOG. In comparison with a popular tool MultiParanoid on simulated data, MultiMSOAR 2.0 shows significantly higher prediction accuracy. It also outperforms MultiParanoid, the Roundup multi-ortholog repository and the Ensembl ortholog database in real data experiments using gene symbols as a validation tool. In addition to ortholog group identification, MultiMSOAR 2.0 also provides information about gene births, duplications and losses in evolution, which may be of independent biological interest. Our experiments on simulated data demonstrate that MultiMSOAR 2.0 is able to infer these evolutionary events much more accurately than a well-known software tool Notung. The software MultiMSOAR 2.0 is available to the public for free.  相似文献   

20.
We have conducted an evolutionary analysis of Notch genes of the vertebrates Danio rerio and Mus musculus to examine the expansion and diversification of the Notch family during vertebrate evolution. The existence of multiple Notch genes in vertebrate genomes suggests that the increase in Notch signaling pathways may be necessary for the additional complexity observed in the vertebrate body plan. However, orthology relationships within the vertebrate Notch family indicate that biological functions are not fixed within orthologous groups. Phylogenetic reconstruction of the vertebrate Notch family suggests that the zebrafish notch1a and 1b genes resulted from a duplication occurring around the time of the teleost/mammalian divergence. There is also evidence that the mouse Notch4 gene is the result of a rapid divergence from a Notch3-like gene. Investigation of the ankyrin repeat region sequences showed there to be little evidence for gene conversion events between repeat units. However, relationships between repeats 2-5 suggest that these repeats are the result of a tandem duplication of a dual repeat unit. Selective pressure on maintenance of ankyrin repeat sequences indicated by relationships between the repeats suggests that specific repeats are responsible for particular biological activities, a finding consistent with mutational studies of the Caenorhabditis elegans gene glp-1. Sequence similarities between the ankyrin repeats and the region immediately C-terminal of the repeats further suggests that this region may be involved in the modulation of ankyrin repeat function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号