首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A common assumption in comparative genomics is that orthologous genes share greater functional similarity than do paralogous genes (the "ortholog conjecture"). Many methods used to computationally predict protein function are based on this assumption, even though it is largely untested. Here we present the first large-scale test of the ortholog conjecture using comparative functional genomic data from human and mouse. We use the experimentally derived functions of more than 8,900 genes, as well as an independent microarray dataset, to directly assess our ability to predict function using both orthologs and paralogs. Both datasets show that paralogs are often a much better predictor of function than are orthologs, even at lower sequence identities. Among paralogs, those found within the same species are consistently more functionally similar than those found in a different species. We also find that paralogous pairs residing on the same chromosome are more functionally similar than those on different chromosomes, perhaps due to higher levels of interlocus gene conversion between these pairs. In addition to offering implications for the computational prediction of protein function, our results shed light on the relationship between sequence divergence and functional divergence. We conclude that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act.  相似文献   

2.
In the field of phylogenetics and comparative genomics, it is important to establish orthologous relationships when comparing homologous sequences. Due to the slight sequence dissimilarity between orthologs and paralogs, it is prone to regarding paralogs as orthologs. For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision. Depending on their algorithmic implementations, each of these methods sometimes has increased false negative or false positive rates. Here, we developed a novel algorithm for orthology detection that uses a distance method based on the phylogenetic criterion of minimum evolution. Our algorithm assumes that sets of sequences exhibiting orthologous relationships are evolutionarily less costly than sets that include one or more paralogous relationships. Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree. Unlike tree reconciliation, our algorithm appears free from the problem of incorrect topologies of species and gene trees. The reliability of the algorithm was tested in a comparative analysis with two other orthology detection methods using 95 manually curated KOG datasets and 21 experimentally verified EXProt datasets. Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.  相似文献   

3.
The ortholog conjecture posits that orthologous genes are functionally more similar than paralogous genes. This conjecture is a cornerstone of phylogenomics and is used daily by both computational and experimental biologists in predicting, interpreting, and understanding gene functions. A recent study, however, challenged the ortholog conjecture on the basis of experimentally derived Gene Ontology (GO) annotations and microarray gene expression data in human and mouse. It instead proposed that the functional similarity of homologous genes is primarily determined by the cellular context in which the genes act, explaining why a greater functional similarity of (within-species) paralogs than (between-species) orthologs was observed. Here we show that GO-based functional similarity between human and mouse orthologs, relative to that between paralogs, has been increasing in the last five years. Further, compared with paralogs, orthologs are less likely to be included in the same study, causing an underestimation in their functional similarity. A close examination of functional studies of homologs with identical protein sequences reveals experimental biases, annotation errors, and homology-based functional inferences that are labeled in GO as experimental. These problems and the temporary nature of the GO-based finding make the current GO inappropriate for testing the ortholog conjecture. RNA sequencing (RNA-Seq) is known to be superior to microarray for comparing the expressions of different genes or in different species. Our analysis of a large RNA-Seq dataset of multiple tissues from eight mammals and the chicken shows that the expression similarity between orthologs is significantly higher than that between within-species paralogs, supporting the ortholog conjecture and refuting the cellular context hypothesis for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny using more and better functional data is needed.  相似文献   

4.
Conservation of adjacency as evidence of paralogous operons   总被引:5,自引:2,他引:3  
  相似文献   

5.
6.
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.  相似文献   

7.
Assignment of orthologous genes via genome rearrangement   总被引:1,自引:0,他引:1  
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics. Existing methods that assign orthologs based on the similarity between DNA or protein sequences may make erroneous assignments when sequence similarity does not clearly delineate the evolutionary relationship among genes of the same families. In this paper, we present a new approach to ortholog assignment that takes into account both sequence similarity and evolutionary events at a genome level, where orthologous genes are assumed to correspond to each other in the most parsimonious evolving scenario under genome rearrangement. First, the problem is formulated as that of computing the signed reversal distance with duplicates between the two genomes of interest. Then, the problem is decomposed into two new optimization problems, called minimum common partition and maximum cycle decomposition, for which efficient heuristic algorithms are given. Following this approach, we have implemented a high-throughput system for assigning orthologs on a genome scale, called SOAR, and tested it on both simulated data and real genome sequence data. Compared to a recent ortholog assignment method based entirely on homology search (called INPARANOID), SOAR shows a marginally better performance in terms of sensitivity on the real data set because it is able to identify several correct orthologous pairs that are missed by INPARANOID. The simulation results demonstrate that SOAR, in general, performs better than the iterated exemplar algorithm in terms of computing the reversal distance and assigning correct orthologs.  相似文献   

8.
Measuring in a quantitative, statistical sense the degree to which structural and functional information can be "transferred" between pairs of related protein sequences at various levels of similarity is an essential prerequisite for robust genome annotation. To this end, we performed pairwise sequence, structure and function comparisons on approximately 30,000 pairs of protein domains with known structure and function. Our domain pairs, which are constructed according to the SCOP fold classification, range in similarity from just sharing a fold, to being nearly identical. Our results show that traditional scores for sequence and structure similarity have the same basic exponential relationship as observed previously, with structural divergence, measured in RMS, being exponentially related to sequence divergence, measured in percent identity. However, as the scale of our survey is much larger than any previous investigations, our results have greater statistical weight and precision. We have been able to express the relationship of sequence and structure similarity using more "modern scores," such as Smith-Waterman alignment scores and probabilistic P-values for both sequence and structure comparison. These modern scores address some of the problems with traditional scores, such as determining a conserved core and correcting for length dependency; they enable us to phrase the sequence-structure relationship in more precise and accurate terms. We found that the basic exponential sequence-structure relationship is very general: the same essential relationship is found in the different secondary-structure classes and is evident in all the scoring schemes. To relate function to sequence and structure we assigned various levels of functional similarity to the domain pairs, based on a simple functional classification scheme. This scheme was constructed by combining and augmenting annotations in the enzyme and fly functional classifications and comparing subsets of these to the Escherichia coli and yeast classifications. We found sigmoidal relationships between similarity in function and sequence, with clear thresholds for different levels of functional conservation. For pairs of domains that share the same fold, precise function appears to be conserved down to approximately 40 % sequence identity, whereas broad functional class is conserved to approximately 25 %. Interestingly, percent identity is more effective at quantifying functional conservation than the more modern scores (e.g. P-values). Results of all the pairwise comparisons and our combined functional classification scheme for protein structures can be accessed from a web database at http://bioinfo.mbb.yale.edu/alignCopyright 2000 Academic Press.  相似文献   

9.
Olfactory receptors (ORs) are a large family of proteins involved in the recognition and discrimination of numerous odorants. These receptors belong to the G-protein coupled receptor (GPCR) hyperfamily, for which little structural data are available. In this study we predict the binding site residues of OR proteins by analyzing a set of 1441 OR protein sequences from mouse and human. The central insight utilized is that functional contact residues would be conserved among pairs of orthologous receptors, but considerably less conserved among paralogous pairs. Using judiciously selected subsets of 218 ortholog pairs and 518 paralog pairs, we have identified 22 sequence positions that are both highly conserved among the putative orthologs and variable among paralogs. These residues are disposed on transmembrane helices 2 to 7, and on the second extracellular loop of the receptor. Strikingly, although the prediction makes no assumption about the location of the binding site, these amino acid positions are clustered around a pocket in a structural homology model of ORs, mostly facing the inner lumen. We propose that the identified positions constitute the odorant binding site. This conclusion is supported by the observation that all but one of the predicted binding site residues correspond to ligand-contact positions in other rhodopsin-like GPCRs.  相似文献   

10.

Background  

Orthology is one of the cornerstones of gene function prediction. Dividing the phylogenetic relations between genes into either orthologs or paralogs is however an oversimplification. Already in two-species gene-phylogenies, the complicated, non-transitive nature of phylogenetic relations results in inparalogs and outparalogs. For situations with more than two species we lack semantics to specifically describe the phylogenetic relations, let alone to exploit them. Published procedures to extract orthologous groups from phylogenetic trees do not allow identification of orthology at various levels of resolution, nor do they document the relations between the orthologous groups.  相似文献   

11.
Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis, or some combination of these approaches. The most direct tree-based methods typically rely on the comparison of an individual gene tree with a species tree. Once the two trees are accurately constructed, orthologs are straightforwardly identified by the definition of orthology as those homologs that are related by speciation, rather than gene duplication, at their most recent point of origin. Although ideal for the purpose of orthology identification in principle, phylogenetic trees are computationally expensive to construct for large numbers of genes and genomes, and they often contain errors, especially at large evolutionary distances. Moreover, in many organisms, in particular prokaryotes and viruses, evolution does not appear to have followed a simple 'tree-like' mode, which makes conventional tree reconciliation inapplicable. Other, heuristic methods identify probable orthologs as the closest homologous pairs or groups of genes in a set of organisms. These approaches are faster and easier to automate than tree-based methods, with efficient implementations provided by graph-theoretical algorithms enabling comparisons of thousands of genomes. Comparisons of these two approaches show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances. Synteny also can aid in identification of orthologs. Often, tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods.  相似文献   

12.
13.
As the result of the EUROIMAGE Consortium sequencing project, we have isolated and characterized a novel gene on chromosome 15, TM6SF1. It encodes a 370 amino acid product with enhanced expression in spleen, testis and peripheral blood leukocytes. We have identified another gene, paralogous to TM6SF1 on chromosome 19p12, TM6SF2, with an overall similarity of 68% and 52% identity at the protein level. This conservation has led us to uncover a series of eleven genes in 19p13.3-->p12 with close homology to genes in 15q24--> q26. The percentage of sequence similarity between each paralogous pair of genes at the protein level ranges between 43 and 89%. A partial conservation of synteny with mouse chromosomes 7, 8 and 9 is also observed. The corresponding orthologous genes in mouse of human TM6SF1 and TM6SF2 show a high degree of amino acid sequence conservation.  相似文献   

14.

Background  

The transfer of functional annotations from model organism proteins to human proteins is one of the main applications of comparative genomics. Various methods are used to analyze cross-species orthologous relationships according to an operational definition of orthology. Often the definition of orthology is incorrectly interpreted as a prediction of proteins that are functionally equivalent across species, while in fact it only defines the existence of a common ancestor for a gene in different species. However, it has been demonstrated that orthologs often reveal significant functional similarity. Therefore, the quality of the orthology prediction is an important factor in the transfer of functional annotations (and other related information). To identify protein pairs with the highest possible functional similarity, it is important to qualify ortholog identification methods.  相似文献   

15.
Here, we discuss the relationship between protein sequence and protein structural similarity. It is established that a protein structural distance (PSD) of 2.0 is a threshold above which two proteins are unlikely to have a detectable pairwise sequence relationship. A precise correlation is established between the level of sequence similarity, defined by a normalized Smith-Waterman score, and the probability that two proteins will have a similar structure (defined by pairwise PSD<2). This correlation can be used in evaluating the likelihood for success in a comparative modeling procedure. We establish the existence of a correlation between sequence and structural similarity for pairs of proteins that are related in structure but whose sequence relationship is not detectable using standard pairwise sequence alignments. Although it is well known that there is a close relationship between sequence and structural similarity for pairwise sequence identities greater than about 30 %, there has been little discussion as to the possible existence of such a relationship for pairs of proteins in or below the twilight zone of sequence similarity (<25 % pairwise sequence identity). Possible implications of our results for the evolution of protein structure are discussed.  相似文献   

16.
Twilight zone of protein sequence alignments   总被引:38,自引:0,他引:38  
Sequence alignments unambiguously distinguish between protein pairs of similar and non-similar structure when the pairwise sequence identity is high (>40% for long alignments). The signal gets blurred in the twilight zone of 20-35% sequence identity. Here, more than a million sequence alignments were analysed between protein pairs of known structures to re-define a line distinguishing between true and false positives for low levels of similarity. Four results stood out. (i) The transition from the safe zone of sequence alignment into the twilight zone is described by an explosion of false negatives. More than 95% of all pairs detected in the twilight zone had different structures. More precisely, above a cut-off roughly corresponding to 30% sequence identity, 90% of the pairs were homologous; below 25% less than 10% were. (ii) Whether or not sequence homology implied structural identity depended crucially on the alignment length. For example, if 10 residues were similar in an alignment of length 16 (>60%), structural similarity could not be inferred. (iii) The 'more similar than identical' rule (discarding all pairs for which percentage similarity was lower than percentage identity) reduced false positives significantly. (iv) Using intermediate sequences for finding links between more distant families was almost as successful: pairs were predicted to be homologous when the respective sequence families had proteins in common. All findings are applicable to automatic database searches.  相似文献   

17.
We demonstrate that protein–protein interaction networks in several eukaryotic organisms contain significantly more self-interacting proteins than expected if such homodimers randomly appeared in the course of the evolution. We also show that on average homodimers have twice as many interaction partners than non-self-interacting proteins. More specifically, the likelihood of a protein to physically interact with itself was found to be proportional to the total number of its binding partners. These properties of dimers are in agreement with a phenomenological model, in which individual proteins differ from each other by the degree of their ‘stickiness’ or general propensity toward interaction with other proteins including oneself. A duplication of self-interacting proteins creates a pair of paralogous proteins interacting with each other. We show that such pairs occur more frequently than could be explained by pure chance alone. Similar to homodimers, proteins involved in heterodimers with their paralogs on average have twice as many interacting partners than the rest of the network. The likelihood of a pair of paralogous proteins to interact with each other was also shown to decrease with their sequence similarity. This points to the conclusion that most of interactions between paralogs are inherited from ancestral homodimeric proteins, rather than established de novo after duplication. We finally discuss possible implications of our empirical observations from functional and evolutionary standpoints.  相似文献   

18.
Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology—evolutionary relatedness—is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit—from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.  相似文献   

19.
20.
The function of most proteins is not determined experimentally, but is extrapolated from homologs. According to the "ortholog conjecture", or standard model of phylogenomics, protein function changes rapidly after duplication, leading to paralogs with different functions, while orthologs retain the ancestral function. We report here that a comparison of experimentally supported functional annotations among homologs from 13 genomes mostly supports this model. We show that to analyze GO annotation effectively, several confounding factors need to be controlled: authorship bias, variation of GO term frequency among species, variation of background similarity among species pairs, and propagated annotation bias. After controlling for these biases, we observe that orthologs have generally more similar functional annotations than paralogs. This is especially strong for sub-cellular localization. We observe only a weak decrease in functional similarity with increasing sequence divergence. These findings hold over a large diversity of species; notably orthologs from model organisms such as E. coli, yeast or mouse have conserved function with human proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号