首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A common assumption in comparative genomics is that orthologous genes share greater functional similarity than do paralogous genes (the "ortholog conjecture"). Many methods used to computationally predict protein function are based on this assumption, even though it is largely untested. Here we present the first large-scale test of the ortholog conjecture using comparative functional genomic data from human and mouse. We use the experimentally derived functions of more than 8,900 genes, as well as an independent microarray dataset, to directly assess our ability to predict function using both orthologs and paralogs. Both datasets show that paralogs are often a much better predictor of function than are orthologs, even at lower sequence identities. Among paralogs, those found within the same species are consistently more functionally similar than those found in a different species. We also find that paralogous pairs residing on the same chromosome are more functionally similar than those on different chromosomes, perhaps due to higher levels of interlocus gene conversion between these pairs. In addition to offering implications for the computational prediction of protein function, our results shed light on the relationship between sequence divergence and functional divergence. We conclude that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act.  相似文献   

2.
Pseudoperonospora cubensis, an obligate oomycete pathogen, is the causal agent of cucurbit downy mildew, a foliar disease of global economic importance. Similar to other oomycete plant pathogens, Ps. cubensis has a suite of RXLR and RXLR-like effector proteins, which likely function as virulence or avirulence determinants during the course of host infection. Using in silico analyses, we identified 271 candidate effector proteins within the Ps. cubensis genome with variable RXLR motifs. In extending this analysis, we present the functional characterization of one Ps. cubensis effector protein, RXLR protein 1 (PscRXLR1), and its closest Phytophthora infestans ortholog, PITG_17484, a member of the Drug/Metabolite Transporter (DMT) superfamily. To assess if such effector-non-effector pairs are common among oomycete plant pathogens, we examined the relationship(s) among putative ortholog pairs in Ps. cubensis and P. infestans. Of 271 predicted Ps. cubensis effector proteins, only 109 (41%) had a putative ortholog in P. infestans and evolutionary rate analysis of these orthologs shows that they are evolving significantly faster than most other genes. We found that PscRXLR1 was up-regulated during the early stages of infection of plants, and, moreover, that heterologous expression of PscRXLR1 in Nicotiana benthamiana elicits a rapid necrosis. More interestingly, we also demonstrate that PscRXLR1 arises as a product of alternative splicing, making this the first example of an alternative splicing event in plant pathogenic oomycetes transforming a non-effector gene to a functional effector protein. Taken together, these data suggest a role for PscRXLR1 in pathogenicity, and, in total, our data provide a basis for comparative analysis of candidate effector proteins and their non-effector orthologs as a means of understanding function and evolutionary history of pathogen effectors.  相似文献   

3.
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics, since many computational methods for solving various biological problems critically rely on bona fide orthologs as input. While it is usually done using sequence similarity search, we recently proposed a new combinatorial approach that combines sequence similarity and genome rearrangement. This paper continues the development of the approach and unites genome rearrangement events and (post-speciation) duplication events in a single framework under the parsimony principle. In this framework, orthologous genes are assumed to correspond to each other in the most parsimonious evolutionary scenario involving both genome rearrangement and (post-speciation) gene duplication. Besides several original algorithmic contributions, the enhanced method allows for the detection of inparalogs. Following this approach, we have implemented a high-throughput system for ortholog assignment on a genome scale, called MSOAR, and applied it to human and mouse genomes. As the result will show, MSOAR is able to find 99 more true orthologs than the INPARANOID program did. In comparison to the iterated exemplar algorithm on simulated data, MSOAR performed favorably in terms of assignment accuracy. We also validated our predicted main ortholog pairs between human and mouse using public ortholog assignment datasets, synteny information, and gene function classification. These test results indicate that our approach is very promising for genome-wide ortholog assignment. Supplemental material and MSOAR program are available at http://msoar.cs.ucr.edu.  相似文献   

4.
While comparative bacterial genomic studies commonly predict a set of genes indicative of common ancestry, experimental validation of the existence of this core genome requires extensive measurement and is typically not undertaken. Enabled by an extensive proteome database developed over six years, we have experimentally verified the expression of proteins predicted from genomic ortholog comparisons among 17 environmental and pathogenic bacteria. More exclusive relationships were observed among the expressed protein content of phenotypically related bacteria, which is indicative of the specific lifestyles associated with these organisms. Although genomic studies can establish relative orthologous relationships among a set of bacteria and propose a set of ancestral genes, our proteomics study establishes expressed lifestyle differences among conserved genes and proposes a set of expressed ancestral traits.  相似文献   

5.
Fliess A  Motro B  Unger R 《Proteins》2002,48(2):377-387
An important question in protein evolution is to what extent proteins may have undergone swaps (switches of domain or fragment order) during evolution. Such events might have occurred in several forms: Swaps of short fragments, swaps of structural and functional motifs, or recombination of domains in multidomain proteins. This question is important for the theoretical understanding of the evolution of proteins, and has practical implications for using swaps as a design tool in protein engineering. In order to analyze the question systematically, we conducted a large scale survey of possible swaps and permutations among all pairs of protein from the Swissport database. A swap is defined as a specific kind of sequence mutation between two proteins in which two fragments that appear in both sequences have different relative order in the two sequences. For example, aXbYc and dYeXf are defined as a swap, where X and Y represent sequence fragments that switched their order. Identifying such swaps is difficult using standard sequence comparison packages. One of the main problems in the analysis stems from the fact that many sequences contain repeats, which may be identified as false-positive swaps. We have used two different approaches to detect pairs of proteins with swaps. The first approach is based on the predefined list of domains in Pfam. We identified all the proteins that share at least two domains and analyzed their relative order, looking for pairs in which the order of these domains was switched. We designed an algorithm to distinguish between real swaps and duplications. In the second approach, we used Blast to detect pairs of proteins that share several fragments. Then, we used an automatic procedure to select pairs that are likely to contain swaps. Those pairs were analyzed visually, using a graphical tool, to eliminate duplications. Combining these approaches, about 140 different cases of swaps in the Swissprot database were found (after eliminating multiple pairs within the same family). Some of the cases have been described in the literature, but many are novel examples. Although each new example identified may be interesting to analyze, our main conclusion is that cases of swaps are rare in protein evolution. This observation is at odds with the common view that proteins are very modular to the point that modules (e.g., domains) can be shuffled between proteins with minimal constraints. Our study suggests that sequential constraints, i.e., the relative order between domains, are highly conserved.  相似文献   

6.
7.
8.
We present the first large-scale survey of N-terminal protein maturation in archaea based on 873 proteomically identified N-terminal peptides from the two haloarchaea Halobacterium salinarum and Natronomonas pharaonis. The observed protein maturation pattern can be attributed to the combined action of methionine aminopeptidase and N-terminal acetyltransferase and applies to cytosolic proteins as well as to a large fraction of integral membrane proteins. Both N-terminal maturation processes primarily depend on the amino acid in penultimate position, in which serine and threonine residues are over represented. Removal of the initiator methionine occurs in two-thirds of the haloarchaeal proteins and requires a small penultimate residue, indicating that methionine aminopeptidase specificity is conserved across all domains of life. While N-terminal acetylation is rare in bacteria, our proteomic data show that acetylated N termini are common in archaea affecting about 15% of the proteins and revealing a distinct archaeal N-terminal acetylation pattern. Haloarchaeal N-terminal acetyltransferase reveals narrow substrate specificity, which is limited to cleaved N termini starting with serine or alanine residues. A comparative analysis of 140 ortholog pairs with identified N-terminal peptide showed that acetylatable N-terminal residues are predominantly conserved amongst the two haloarchaea. Only few exceptions from the general N-terminal acetylation pattern were observed, which probably represent protein-specific modifications as they were confirmed by ortholog comparison.  相似文献   

9.
To understand the function of protein complexes and their association with biological processes, a lot of studies have been done towards analyzing the protein-protein interaction (PPI) networks. However, the advancement in high-throughput technology has resulted in a humongous amount of data for analysis. Moreover, high level of noise, sparseness, and skewness in degree distribution of PPI networks limits the performance of many clustering algorithms and further analysis of their interactions.In addressing and solving these problems we present a novel random walk based algorithm that converts the incomplete and binary PPI network into a protein-protein topological similarity matrix (PP-TS matrix). We believe that if two proteins share some high-order topological similarities they are likely to be interacting with each other. Using the obtained PP-TS matrix, we constructed and used weighted networks to further study and analyze the interaction among proteins. Specifically, we applied a fully automated community structure finding algorithm (Auto-HQcut) on the obtained weighted network to cluster protein complexes. We then analyzed the protein complexes for significance in biological processes. To help visualize and analyze these protein complexes we also developed an interface that displays the resulting complexes as well as the characteristics associated with each complex.Applying our approach to a yeast protein-protein interaction network, we found that the predicted protein-protein interaction pairs with high topological similarities have more significant biological relevance than the original protein-protein interactions pairs. When we compared our PPI network reconstruction algorithm with other existing algorithms using gene ontology and gene co-expression, our algorithm produced the highest similarity scores. Also, our predicted protein complexes showed higher accuracy measure compared to the other protein complex predictions.  相似文献   

10.
11.
A proteome-wide protein-protein interaction (PPI) network of Methanobrevibacter ruminantium M1 (MRU), a predominant rumen methanogen, was constructed from its metabolic genes using a gene neighborhood algorithm and then compared with closely related rumen methanogens Using proteome-wide PPI approach, we constructed network encompassed 2194 edges and 637 nodes interacting with 634 genes. Network quality and robustness of functional modules were assessed with gene ontology terms. A structure-function-metabolism mapping for each protein has been carried out with efforts to extract experimental PPI concomitant information from the literature. The results of our study revealed that some topological properties of its network were robust for sharing homologous protein interactions across heterotrophic and hydrogenotrophic methanogens. MRU proteome has shown to establish many PPI sub-networks for associated metabolic subsystems required to survive in the rumen environment. MRU genome found to share interacting proteins from its PPI network involved in specific metabolic subsystems distinct to heterotrophic and hydrogenotrophic methanogens. Across these proteomes, the interacting proteins from differential PPI networks were shared in common for the biosynthesis of amino acids, nucleosides, and nucleotides and energy metabolism in which more fractions of protein pairs shared with Methanosarcina acetivorans. Our comparative study expedites our knowledge to understand a complex proteome network associated with typical metabolic subsystems of MRU and to improve its genome-scale reconstruction in the future.  相似文献   

12.
A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user''s knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user''s knowledge as a key aspect of the technique adds value to purely statistical formal methods.  相似文献   

13.
The formation of proteins into stable protein complexes plays a fundamental role in the operation of the cell. The study of the degree of evolutionary conservation of protein complexes between species and the evolution of protein-protein interactions has been hampered by lack of comprehensive coverage of the high-throughput (HTP) technologies that measure the interactome. We show that new high-throughput datasets on protein co-purification in yeast have a substantially lower false negative rate than previous datasets when compared to known complexes. These datasets are therefore more suitable to estimate the conservation of protein complex membership than hitherto possible. We perform comparative genomics between curated protein complexes from human and the HTP data in Saccharomyces cerevisiae to study the evolution of co-complex memberships. This analysis revealed that out of the 5,960 protein pairs that are part of the same complex in human, 2,216 are absent because both proteins lack an ortholog in S. cerevisiae, while for 1,828 the co-complex membership is disrupted because one of the two proteins lacks an ortholog. For the remaining 1,916 protein pairs, only 10% were never co-purified in the large-scale experiments. This implies a conservation level of co-complex membership of 90% when the genes coding for the protein pairs that participate in the same protein complex are also conserved. We conclude that the evolutionary dynamics of protein complexes are, by and large, not the result of network rewiring (i.e. acquisition or loss of co-complex memberships), but mainly due to genomic acquisition or loss of genes coding for subunits. We thus reveal evidence for the tight interrelation of genomic and network evolution.  相似文献   

14.
How gene function evolves is a central question of evolutionary biology. It can be investigated by comparing functional genomics results between species and between genes. Most comparative studies of functional genomics have used pairwise comparisons. Yet it has been shown that this can provide biased results, as genes, like species, are phylogenetically related. Phylogenetic comparative methods should be used to correct for this, but they depend on strong assumptions, including unbiased tree estimates relative to the hypothesis being tested. Such methods have recently been used to test the “ortholog conjecture,” the hypothesis that functional evolution is faster in paralogs than in orthologs. Although pairwise comparisons of tissue specificity (τ) provided support for the ortholog conjecture, phylogenetic independent contrasts did not. Our reanalysis on the same gene trees identified problems with the time calibration of duplication nodes. We find that the gene trees used suffer from important biases, due to the inclusion of trees with no duplication nodes, to the relative age of speciations and duplications, to systematic differences in branch lengths, and to non-Brownian motion of tissue specificity on many trees. We find that incorrect implementation of phylogenetic method in empirical gene trees with duplications can be problematic. Controlling for biases allows successful use of phylogenetic methods to study the evolution of gene function and provides some support for the ortholog conjecture using three different phylogenetic approaches.  相似文献   

15.
DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly.The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.  相似文献   

16.
Glycosylphosphatidylinositol‐anchored proteins (GPI‐APs) are a class of lipid anchored proteins expressed on the cell surface of eukaryotes. The potential interaction of GPI‐APs with ordered lipid domains enriched in cholesterol and sphingolipids has been proposed to function in the intracellular transport of these lipid anchored proteins. Here, we examined the biological importance of two saturated fatty acids present in the phosphatidylinositol moiety of GPI‐APs. These fatty acids are introduced by the action of lipid remodeling enzymes and required for the GPI‐AP association within ordered lipid domains. We found that the fatty acid remodeling is not required for either efficient Golgi‐to‐plasma membrane transport or selective endocytosis via GPI‐enriched early endosomal compartment (GEEC)/ clathrin‐independent carrier (CLIC) pathway, whereas cholesterol depletion significantly affects both pathways independent of their fatty acid structure. Therefore, the mechanism of cholesterol dependence does not appear to be related to the interaction with ordered lipid domains mediated by two saturated fatty acids. Furthermore, cholesterol extraction drastically releases the unremodeled GPI‐APs carrying an unsaturated fatty acid from the cell surface, but not remodeled GPI‐APs carrying two saturated fatty acids. This underscores the essential role of lipid remodeling to ensure a stable membrane association of GPI‐APs particularly under potential membrane lipid perturbation.   相似文献   

17.
Glycosylphosphatidylinositol‐anchored proteins (GPI‐APs) are an important class of glycoproteins that are tethered to the surface of mammalian cells via the lipid GPI. GPI‐APs have been implicated in many important cellular functions including cell adhesion, cell signaling, and immune regulation. Proteomic identification of mammalian GPI‐APs en masse has been limited technically by poor sensitivity for these low abundance proteins and the use of methods that destroy cell integrity. Here, we present methodology that permits identification of GPI‐APs liberated directly from the surface of intact mammalian cells through exploitation of their appended glycans to enrich for these proteins ahead of LC‐MS/MS analyses. We validate our approach in HeLa cells, identifying a greater number of GPI‐APs from intact cells than has been previously identified from isolated HeLa membranes and a lipid raft preparation. We further apply our approach to define the cohort of endogenous GPI‐APs that populate the distinct apical and basolateral membrane surfaces of polarized epithelial cell monolayers. Our approach provides a new method to achieve greater sensitivity in the identification of low abundance GPI‐APs from the surface of live cells and the nondestructive nature of the method provides new opportunities for the temporal or spatial analysis of cellular GPI‐AP expression and dynamics.  相似文献   

18.

Background  

The transfer of functional annotations from model organism proteins to human proteins is one of the main applications of comparative genomics. Various methods are used to analyze cross-species orthologous relationships according to an operational definition of orthology. Often the definition of orthology is incorrectly interpreted as a prediction of proteins that are functionally equivalent across species, while in fact it only defines the existence of a common ancestor for a gene in different species. However, it has been demonstrated that orthologs often reveal significant functional similarity. Therefore, the quality of the orthology prediction is an important factor in the transfer of functional annotations (and other related information). To identify protein pairs with the highest possible functional similarity, it is important to qualify ortholog identification methods.  相似文献   

19.
ADP-glucose pyrophosphorylase (AGPase), a key enzyme involved in higher plant starch biosynthesis, is composed of pairs of large (LS) and small subunits (SS). Ample evidence has shown that the AGPase catalyzes the rate limiting step in starch biosynthesis in higher plants. In this study, we compiled detailed comparative information about ADP glucose pyrophosphorylase in selected plants by analyzing their structural features e.g. amino acid content, physico-chemical properties, secondary structural features and phylogenetic classification. Functional analysis of these proteins includes identification of important 10 to 20 amino acids long motifs arise because specific residues and regions proved to be important for the biological function of a group of proteins, which are conserved in both structure and sequence during evolution. Phylogenetic analysis depicts two main clusters. Cluster I encompasses large subunits (LS) while cluster II contains small subunits (SS).  相似文献   

20.
A key complication in comparative genomics for reliable gene function prediction is the existence of duplicated genes. To study the effect of gene duplication on function prediction, we analyze orthologs between pairs of genomes where in one genome the orthologous gene has duplicated after the speciation of the two genomes (i.e. inparalogs). For these duplicated genes we investigate whether the gene that is most similar on the sequence level is also the gene that has retained the ancestral gene-neighborhood. Although the majority of investigated cases show a consistent pattern between sequence similarity and gene-neighborhood conservation, a substantial fraction, 29–38%, is inconsistent. The observation of inconsistency is not the result of a chance outcome owing to a lack of divergence time between inparalogs, but rather it seems to be the result of a chance outcome caused by very similar rates of sequence evolution of both inparalogs relative to their ortholog. If one-to-one orthologous relationships are required, it is advisable to combine contextual information (i.e. gene-neighborhood in prokaryotes and co-expression in eukaryotes) with protein sequence information to predict the most probable functional equivalent ortholog in the presence of inparalogs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号