首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Clustering of main orthologs for multiple genomes   总被引:1,自引:0,他引:1  
The identification of orthologous genes shared by multiple genomes is critical for both functional and evolutionary studies in comparative genomics. While it is usually done by sequence similarity search and reconciled tree construction in practice, recently a new combinatorial approach and high-throughput system MSOAR for ortholog identification between closely related genomes based on genome rearrangement and gene duplication has been proposed in Fu et al. MSOAR assumes that orthologous genes correspond to each other in the most parsimonious evolutionary scenario, minimizing the number of genome rearrangement and (postspeciation) gene duplication events. However, the parsimony approach used by MSOAR limits it to pairwise genome comparisons. In this paper, we extend MSOAR to multiple (closely related) genomes and propose an ortholog clustering method, called MultiMSOAR, to infer main orthologs in multiple genomes. As a preliminary experiment, we apply MultiMSOAR to rat, mouse, and human genomes, and validate our results using gene annotations and gene function classifications in the public databases. We further compare our results to the ortholog clusters predicted by MultiParanoid, which is an extension of the well-known program InParanoid for pairwise genome comparisons. The comparison reveals that MultiMSOAR gives more detailed and accurate orthology information, since it can effectively distinguish main orthologs from inparalogs.  相似文献   

2.
The HGNC Comparison of Orthology Predictions search tool, HCOP (), enables users to compare predicted human and mouse orthologs for a specified gene, or set of genes, from either species according to the ortholog assertions from the Ensembl, HGNC, Homologene, Inparanoid, MGI and PhIGs databases. Users can assess the reliability of the prediction from the number of these different sources that identify a particular orthologous pair. HCOP provides a useful one-stop resource to summarise, compare and access various sources of human and mouse orthology data.  相似文献   

3.
4.
Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic relationships among predicted orthologs (based on the OrthoMCL method) to a query gene from any of eight eukaryotic organisms, and to see the orthologs in a wider evolutionary context (based on the Jaccard clustering method). In addition to the phylogenetic information, the database contains experimental results manually collected from the literature that can be compared to the computational analyses, as well as links to relevant human disease and gene information via the OMIM, model organism, and sequence databases. Our aim is for the P-POD resource to be extremely useful to typical experimental biologists wanting to learn more about the evolutionary context of their favorite genes. P-POD is based on the commonly used Generic Model Organism Database (GMOD) schema and can be downloaded in its entirety for installation on one's own system. Thus, bioinformaticians and software developers may also find P-POD useful because they can use the P-POD database infrastructure when developing their own comparative genomics resources and database tools.  相似文献   

5.
The non-coding fraction of the human genome, which is approximately 98%, is mainly constituted by repeats. Transpositions, expansions and deletions of these repeat elements contribute to a number of diseases. None of the available databases consolidates information on both tandem and interspersed repeats with the flexibility of FASTA based homology search with reference to disease genes. Repeats in diseases database (RiDs db) is a web accessible relational database, which aids analysis of repeats associated with Mendelian disorders. It is a repository of disease genes, which can be searched by FASTA program or by limitedor free- text keywords. Unlike other databases, RiDs db contains the sequences of these genes with access to corresponding information on both interspersed and tandem repeats contained within them, on a unified platform. Comparative analysis of novel or patient sequences with the reference sequences in RiDs db using FASTA search will indicate change in structure of repeats, if any, with a particular disorder. This database also provides links to orthologs in model organisms such as zebrafish, mouse and Drosophila. AVAILABILITY: The database is available for free at http://115.111.90.196/ridsdb/index.php.  相似文献   

6.
The agnostic screening performed by genome-wide association studies (GWAS) has uncovered associations for previously unsuspected genes. Knowledge about the functional role of these genes is crucial and laboratory mouse models can provide such information. Here, we describe a systematic juxtaposition of human GWAS-discovered loci versus mouse models in order to appreciate the availability of mouse models data, to gain biological insights for the role of these genes and to explore the extent of concordance between these two lines of evidence. We perused publicly available data (NHGRI database for human associations and Mouse Genome Informatics database for mouse models) and employed two alternative approaches for cross-species comparisons, phenotype- and gene-centric. A total of 293 single gene-phenotype human associations (262 unique genes and 69 unique phenotypes) were evaluated. In the phenotype-centric approach, we identified all mouse models and related ortholog genes for the 51 human phenotypes with a comparable phenotype in mice. A total of 27 ortholog genes were found to be associated with the same phenotype in humans and mice, a concordance that was significantly larger than expected by chance (p<0.001). In the gene-centric approach, we were able to locate at least 1 knockout model for 60% of the 262 genes. The knockouts for 35% of these orthologs displayed pre- or post-natal lethality. For the remaining non-lethal orthologs, the same organ system was involved in mice and humans in 71% of the cases (p<0.001). Our project highlights the wealth of available information from mouse models for human GWAS, catalogues extensive information on plausible physiologic implications for many genes, provides hypothesis-generating findings for additional GWAS analyses and documents that the concordance between human and mouse genetic association is larger than expected by chance and can be informative.  相似文献   

7.
Using computational approaches we have identified 2017 expressed intronless genes in the mouse genome. Evolutionary analysis reveals that 56 intronless genes are conserved among the three domains of life--bacteria, archea and eukaryotes. These highly conserved intronless genes were found to be involved in essential housekeeping functions. About 80% of expressed mouse intronless genes have orthologs in eukaryotic genomes only, and thus are specific to eukaryotic organisms. 608 of these genes have intronless human orthologs and 302 of these orthologs have a match in OMIM database. Investigation into these mouse genes will be important in generating mouse models for understanding human diseases.  相似文献   

8.
Orthology is one of the most important tools available to modern biology, as it allows making inferences from easily studied model systems to much less tractable systems of interest, such as ourselves. This becomes important not least in the study of genetic diseases. We here review work on the orthology of disease-associated genes and also present an updated version of the InParanoid-based disease orthology database and web site OrthoDisease, with 14-fold increased species coverage since the previous version. Using this resource, we survey the taxonomic distribution of orthologs of human genes involved in different disease categories. The hypothesis that paralogs can mask the effect of deleterious mutations predicts that known heritable disease genes should have fewer close paralogs. We found large-scale support for this hypothesis as significantly fewer duplications were observed for disease genes in the OrthoDisease ortholog groups.  相似文献   

9.
Orthology detection is critically important for accurate functional annotation, and has been widely used to facilitate studies on comparative and evolutionary genomics. Although various methods are now available, there has been no comprehensive analysis of performance, due to the lack of a genomic-scale 'gold standard' orthology dataset. Even in the absence of such datasets, the comparison of results from alternative methodologies contains useful information, as agreement enhances confidence and disagreement indicates possible errors. Latent Class Analysis (LCA) is a statistical technique that can exploit this information to reasonably infer sensitivities and specificities, and is applied here to evaluate the performance of various orthology detection methods on a eukaryotic dataset. Overall, we observe a trade-off between sensitivity and specificity in orthology detection, with BLAST-based methods characterized by high sensitivity, and tree-based methods by high specificity. Two algorithms exhibit the best overall balance, with both sensitivity and specificity>80%: INPARANOID identifies orthologs across two species while OrthoMCL clusters orthologs from multiple species. Among methods that permit clustering of ortholog groups spanning multiple genomes, the (automated) OrthoMCL algorithm exhibits better within-group consistency with respect to protein function and domain architecture than the (manually curated) KOG database, and the homolog clustering algorithm TribeMCL as well. By way of using LCA, we are also able to comprehensively assess similarities and statistical dependence between various strategies, and evaluate the effects of parameter settings on performance. In summary, we present a comprehensive evaluation of orthology detection on a divergent set of eukaryotic genomes, thus providing insights and guides for method selection, tuning and development for different applications. Many biological questions have been addressed by multiple tests yielding binary (yes/no) outcomes but no clear definition of truth, making LCA an attractive approach for computational biology.  相似文献   

10.
11.

Background

Orthology is a central tenet of comparative genomics and ortholog identification is instrumental to protein function prediction. Major advances have been made to determine orthology relations among a set of homologous proteins. However, they depend on the comparison of individual sequences and do not take into account divergent orthologs.

Results

We have developed an iterative orthology prediction method, Ortho-Profile, that uses reciprocal best hits at the level of sequence profiles to infer orthology. It increases ortholog detection by 20% compared to sequence-to-sequence comparisons. Ortho-Profile predicts 598 human orthologs of mitochondrial proteins from Saccharomyces cerevisiae and Schizosaccharomyces pombe with 94% accuracy. Of these, 181 were not known to localize to mitochondria in mammals. Among the predictions of the Ortho-Profile method are 11 human cytochrome c oxidase (COX) assembly proteins that are implicated in mitochondrial function and disease. Their co-expression patterns, experimentally verified subcellular localization, and co-purification with human COX-associated proteins support these predictions. For the human gene C12orf62, the ortholog of S. cerevisiae COX14, we specifically confirm its role in negative regulation of the translation of cytochrome c oxidase.

Conclusions

Divergent homologs can often only be detected by comparing sequence profiles and profile-based hidden Markov models. The Ortho-Profile method takes advantage of these techniques in the quest for orthologs.  相似文献   

12.
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics, since many computational methods for solving various biological problems critically rely on bona fide orthologs as input. While it is usually done using sequence similarity search, we recently proposed a new combinatorial approach that combines sequence similarity and genome rearrangement. This paper continues the development of the approach and unites genome rearrangement events and (post-speciation) duplication events in a single framework under the parsimony principle. In this framework, orthologous genes are assumed to correspond to each other in the most parsimonious evolutionary scenario involving both genome rearrangement and (post-speciation) gene duplication. Besides several original algorithmic contributions, the enhanced method allows for the detection of inparalogs. Following this approach, we have implemented a high-throughput system for ortholog assignment on a genome scale, called MSOAR, and applied it to human and mouse genomes. As the result will show, MSOAR is able to find 99 more true orthologs than the INPARANOID program did. In comparison to the iterated exemplar algorithm on simulated data, MSOAR performed favorably in terms of assignment accuracy. We also validated our predicted main ortholog pairs between human and mouse using public ortholog assignment datasets, synteny information, and gene function classification. These test results indicate that our approach is very promising for genome-wide ortholog assignment. Supplemental material and MSOAR program are available at http://msoar.cs.ucr.edu.  相似文献   

13.
J F Houle  E C Friedberg 《Gene》1999,234(2):353-360
Xeroderma pigmentosum complementation group G (XPG) protein is a junction-specific endonuclease which is indispensable for nucleotide excision repair (NER) of DNA in eukaryotes. Recent studies have hinted at a second, essential function for the XPG protein in higher eukaryotes. We undertook a comparison of the amino acid sequences of multiple XPG orthologs to determine if a motif or domain could be identified that is conserved uniquely in higher eukaryotes. A search of current databases allowed us to retrieve complete amino acid sequences for the human, mouse and Xenopus XPG proteins, and for two yeast orthologs. We also identified an incomplete Drosophila open reading frame (ORF) that was a good candidate for the XPG protein. We cloned a complete Drosophila cDNA for this ORF and examination of the primary amino acid sequence suggests that this cDNA encodes the Drosophila ortholog of XPG. A comparison of all six orthologous polypeptides reveals the presence of two previously unidentified conserved domains. One of these is unique to all four higher eukaryotic sequences. Conceivably this domain evolved to support the essential function of XPG protein.  相似文献   

14.
MOTIVATION: Comparative sequence analysis is widely used to study genome function and evolution. This approach first requires the identification of homologous genes and then the interpretation of their homology relationships (orthology or paralogy). To provide help in this complex task, we developed three databases of homologous genes containing sequences, multiple alignments and phylogenetic trees: HOBACGEN, HOVERGEN and HOGENOM. In this paper, we present two new tools for automating the search for orthologs or paralogs in these databases. RESULTS: First, we have developed and implemented an algorithm to infer speciation and duplication events by comparison of gene and species trees (tree reconciliation). Second, we have developed a general method to search in our databases the gene families for which the tree topology matches a peculiar tree pattern. This algorithm of unordered tree pattern matching has been implemented in the FamFetch graphical interface. With the help of a graphical editor, the user can specify the topology of the tree pattern, and set constraints on its nodes and leaves. Then, this pattern is compared with all the phylogenetic trees of the database, to retrieve the families in which one or several occurrences of this pattern are found. By specifying ad hoc patterns, it is therefore possible to identify orthologs in our databases.  相似文献   

15.
Park SH  Shin YK  Suh YH  Park WS  Ban YL  Choi HS  Park HJ  Jung KC 《Gene》2005,353(2):177-188
The human pseudoautosomal region 1 (PAR1) is essential for the obligatory X-Y crossover in male meiosis. Despite its critical role, comparative studies of human and mouse pseudoautosomal genes have been limited owing to the scarcity of genes conserved between the two species. Human CD99 is a 32-kDa cell surface protein that is encoded by the MIC2 gene localized to the PAR1. Although several sequences such as CD99L2, PBDX, and CD99L1 are related to CD99, its murine ortholog, Cd99, has not yet been identified. Here we report a novel mouse Cd99, designated D4, which shows overall sequence homology to CD99, with the highest conservation between the two genes being found in the transmembrane regions. In addition, the D4 protein displays biochemical characteristics, functional homology, and expression patterns similar to those of CD99. The D4 gene is localized on an autosome, chromosome 4, reflecting a common mapping feature with other mouse orthologs of human PAR1 genes. Furthermore, a phylogenetic analysis of CD99-related genes confirmed that the D4 gene is indeed an ortholog of CD99 and exhibits the accelerated evolution pattern of CD99 orthologs, as compared to the CD99L2 orthologs. On the basis of these findings, we suggest that CD99 belongs to the ancient PAR genes, and that the rapid interspecies divergence of its present sequence and map position is due to a high recombination frequency and the occurrence of chromosomal translocation, supporting the addition-attrition hypothesis for PAR evolution.  相似文献   

16.
The identification of orthologs to a set of known genes is often the starting point for evolutionary studies focused on gene families of interest. To date, the existing orthology detection tools (COG, InParanoid, OrthoMCL, etc.) are aimed at genome-wide ortholog identification and lack flexibility for the purposes of case studies. We developed a program OrthoFocus, which employs an extended reciprocal best hit approach to quickly search for orthologs in a pair of genomes. A group of paralogs from the input genome is used as the start for the forward search and the criterion for the reverse search, which allows handling many-to-one and many-to-many relationships. By pairwise comparison of genomes with the input species genome, OrthoFocus enables quick identification of orthologs in multiple genomes and generates a multiple alignment of orthologs so that it can further be used in phylogenetic analysis. The program is available at http://www.lipidomics.ru/.  相似文献   

17.
The ortholog conjecture posits that orthologous genes are functionally more similar than paralogous genes. This conjecture is a cornerstone of phylogenomics and is used daily by both computational and experimental biologists in predicting, interpreting, and understanding gene functions. A recent study, however, challenged the ortholog conjecture on the basis of experimentally derived Gene Ontology (GO) annotations and microarray gene expression data in human and mouse. It instead proposed that the functional similarity of homologous genes is primarily determined by the cellular context in which the genes act, explaining why a greater functional similarity of (within-species) paralogs than (between-species) orthologs was observed. Here we show that GO-based functional similarity between human and mouse orthologs, relative to that between paralogs, has been increasing in the last five years. Further, compared with paralogs, orthologs are less likely to be included in the same study, causing an underestimation in their functional similarity. A close examination of functional studies of homologs with identical protein sequences reveals experimental biases, annotation errors, and homology-based functional inferences that are labeled in GO as experimental. These problems and the temporary nature of the GO-based finding make the current GO inappropriate for testing the ortholog conjecture. RNA sequencing (RNA-Seq) is known to be superior to microarray for comparing the expressions of different genes or in different species. Our analysis of a large RNA-Seq dataset of multiple tissues from eight mammals and the chicken shows that the expression similarity between orthologs is significantly higher than that between within-species paralogs, supporting the ortholog conjecture and refuting the cellular context hypothesis for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny using more and better functional data is needed.  相似文献   

18.
Phylogenomic databases provide orthology predictions for species with fully sequenced genomes. Although the goal seems well-defined, the content of these databases differs greatly. Seven ortholog databases (Ensembl Compara, eggNOG, HOGENOM, InParanoid, OMA, OrthoDB, Panther) were compared on the basis of reference trees. For three well-conserved protein families, we observed a generally high specificity of orthology assignments for these databases. We show that differences in the completeness of predicted gene relationships and in the phylogenetic information are, for the great majority, not due to the methods used, but to differences in the underlying database concepts. According to our metrics, none of the databases provides a fully correct and comprehensive protein classification. Our results provide a framework for meaningful and systematic comparisons of phylogenomic databases. In the future, a sustainable set of 'Gold standard' phylogenetic trees could provide a robust method for phylogenomic databases to assess their current quality status, measure changes following new database releases and diagnose improvements subsequent to an upgrade of the analysis procedure.  相似文献   

19.
Recent additions have expanded the interleukin (IL)-1 gene family to 10 members. We have determined the order, orientation, and intergenic distance of the nine IL-1 family genes that lie on human chromosome 2. We report cDNA sequences for the mouse orthologs of three of these genes. The order and orientation of the mouse genes have been mapped, and the mouse locus compared with the human locus. There is a break in the mouse locus of > 100 kb, compared with the human locus, located between Il1b and the most centromere-proximal of the novel mouse genes. The mouse seems to be missing an ortholog of human IL1F7.  相似文献   

20.
We have characterized the mouse ortholog of the human ubiquitin-specific protease USP15. Mouse Usp15 consists of 981 amino acids with a predicted molecular mass of 112 kDa, contains the highly conserved Cys and His boxes present in all members of the UBP family of deubiquitinating enzymes, and is 98% identical/99% similar to human USP15. Usp15 shares 59.5% identity/75.5% sequence similarity with the mouse Unp(Usp4) oncoprotein. Recombinant Usp15 demonstrated ubiquitin-specific protease activity against engineered linear fusions of ubiquitin to glutathione S-transferase. Usp15 can also cleave the ubiquitin-proline bond, as can USP15 and Usp4. Alignment of mouse and human Usp15 and Usp4 protein sequences suggested that Usp15/USP15 may be alternately spliced in a manner analogous to Usp4. Sequence analysis of RT-PCR products from several human and mouse cell lines and tissues revealed alternate splicing in all cells studied. Northern blot analysis of both mouse and human Usp15 revealed two differently sized mRNAs in all tissues examined, owing to alternate polyadenylation sites spaced by 1.5 kb. Chromosomal mapping by interspecific backcross analysis localized the Usp15 gene to the distal region of mouse Chromosome (Chr) 10. This region is syntenic with human Chr 12q24, the location of human USP15, and a different location to Unp(Usp4) (Chr 9). Identification of the mouse Usp15 gene (>69.5 kb) and human USP15 gene (145 kb) sequences in genome databases reveals that both are composed of 22 exons with identical splice sites, and both have an exon/intron structure identical to the mouse Usp4 gene, including the alternately spliced exon. Phylogenetic studies suggest that a sequence currently identified as a chicken Usp4 ortholog is in fact a USP15 ortholog, while bona-fide chicken, cow, and rat Usp4 orthologs can be identified in EST databases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号