首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The proteomes that make up the collection of proteins in contemporary organisms evolved through recombination and duplication of a limited set of domains. These protein domains are essentially the main components of globular proteins and are the most principal level at which protein function and protein interactions can be understood. An important aspect of domain evolution is their atomic structure and biochemical function, which are both specified by the information in the amino acid sequence. Changes in this information may bring about new folds, functions and protein architectures. With the present and still increasing wealth of sequences and annotation data brought about by genomics, new evolutionary relationships are constantly being revealed, unknown structures modeled and phylogenies inferred. Such investigations not only help predict the function of newly discovered proteins, but also assist in mapping unforeseen pathways of evolution and reveal crucial, co-evolving inter- and intra-molecular interactions. In turn this will help us describe how protein domains shaped cellular interaction networks and the dynamics with which they are regulated in the cell. Additionally, these studies can be used for the design of new and optimized protein domains for therapy. In this review, we aim to describe the basic concepts of protein domain evolution and illustrate recent developments in molecular evolution that have provided valuable new insights in the field of comparative genomics and protein interaction networks.  相似文献   

2.
3.
Machine learning approach for the prediction of protein secondary structure   总被引:8,自引:0,他引:8  
PROMIS (protein machine induction system), a program for machine learning, was used to generalize rules that characterize the relationship between primary and secondary structure in globular proteins. These rules can be used to predict an unknown secondary structure from a known primary structure. The symbolic induction method used by PROMIS was specifically designed to produce rules that are meaningful in terms of chemical properties of the residues. The rules found were compared with existing knowledge of protein structure: some features of the rules were already recognized (e.g. amphipathic nature of alpha-helices). Other features are not understood, and are under investigation. The rules produced a prediction accuracy for three states (alpha-helix, beta-strand and coil) of 60% for all proteins, 73% for proteins of known alpha domain type, 62% for proteins of known beta domain type and 59% for proteins of known alpha/beta domain type. We conclude that machine learning is a useful tool in the examination of the large databases generated in molecular biology.  相似文献   

4.
In eukaryotes, the posttranslational conjugation of ubiquitin to various cellular proteins marks them for degradation. Interestingly, several proteins have been reported to contain ubiquitin-like (ub-like) domains that are in fact specified by the DNA coding sequences of the proteins. The biological role of the ub-like domain in these proteins is not known; however, it has been proposed that this domain functions as a degradation signal rendering the proteins unstable. Here, we report that the product of the Saccharomyces cerevisiae RAD23 gene, which is involved in excision repair of UV-damaged DNA, bears a ub-like domain at its amino terminus. This finding has presented an opportunity to define the functional significance of this domain. We show that deletion of the ub-like domain impairs the DNA repair function of RAD23 and that this domain can be functionally substituted by the authentic ubiquitin sequence. Surprisingly, RAD23 is highly stable, and the studies reported herein indicate that its ub-like domain does not mediate protein degradation. Thus, in RAD23 at least, the ub-like domain affects protein function in a nonproteolytic manner.  相似文献   

5.
6.
7.
Homer family proteins are encoded by three genes, homer1, 2 and 3. Most of these proteins are expressed constitutively in nervous systems and accumulated in postsynaptic regions. However, the functional significance of these proteins, especially the significance of the distinction among the proteins encoded by homer1, 2 and 3, is still obscure. In the present study, we isolated a cDNA clone encoding a novel protein by two-hybrid system screening using the C-terminal half of Homer2b as the bait. This protein, termed 2B28, has 297 amino acid residues and contains three major domains: a UBA domain, a coiled-coil region, and a UBX domain. When expressed in HEK293T cells, 2B28 showed colocalization with uniquitin and enhanced the expression levels of IkappaB or Homer1a proteins, which are known to be degraded by proteasomes, indicating that 2B28 is involved in ubiquitin-proteasome functions. 2B28 specifically interacted and colocalized with Homer2 proteins, but not with Homer1 proteins. So far, we have identified no counterpart of 2B28 for Homer1 experimentally or in the protein databases. These results suggest that the specific interaction of 2B28 with Homer2 may play a role in regulation of protein degradation by ubiquitin-proteasome systems and that this function may be specific to Homer2 proteins among Homer family proteins.  相似文献   

8.
The overall function of a multi‐domain protein is determined by the functional and structural interplay of its constituent domains. Traditional sequence alignment‐based methods commonly utilize domain‐level information and provide classification only at the level of domains. Such methods are not capable of taking into account the contributions of other domains in the proteins, and domain‐linker regions and classify multi‐domain proteins. An alignment‐free protein sequence comparison tool, CLAP (CLAssification of Proteins) was previously developed in our laboratory to especially handle multi‐domain protein sequences without a requirement of defining domain boundaries and sequential order of domains. Through this method we aim to achieve a biologically meaningful classification scheme for multi‐domain protein sequences. In this article, CLAP‐based classification has been explored on 5 datasets of multi‐domain proteins and we present detailed analysis for proteins containing (1) Tyrosine phosphatase and (2) SH3 domain. At the domain‐level CLAP‐based classification scheme resulted in a clustering similar to that obtained from an alignment‐based method. CLAP‐based clusters obtained for full‐length datasets were shown to comprise of proteins with similar functions and domain architectures. Our study demonstrates that multi‐domain proteins could be classified effectively by considering full‐length sequences without a requirement of identification of domains in the sequence.  相似文献   

9.
Chromosomal proteins HMG-14 and HMG-17 have a modular structure. Here we examine whether the putative nucleosome-binding domain in these proteins can function as an independent module. Mobility shift assays with recombinant HMG-17 indicate that synthetic molecules can be used to analyze the interaction of this protein with the nucleosome core. Peptides corresponding to various regions of the protein have been synthesized and their interaction with nucleosome cores analyzed by mobility shift, thermal denaturation and DNase I digestion. A 30 amino acid long peptide, corresponding to the putative nucleosome-binding domain of HMG-17, specifically shifts the mobility of cores as compared to free DNA, elevates the tm of both the premelt and main melt of the cores and protects from DNase I digestion the same nucleosomal DNA sites as the intact protein. The binding of both the peptide and the intact protein is lost upon digestion of the histone tails by trypsin. The nucleosomal binding sites of the peptide appear identical to those of the intact protein. Thus, a region of the protein can acts as an independent functional domain. This supports the notion that HMG-14 and HMG-17 are modular proteins. This finding is relevant to the understanding of the function and evolution of HMG-14/-17, the only nucleosome core particle binding proteins known to date.  相似文献   

10.
A simple statistical method for predicting the functional differentiation of duplicate genes was developed. This method is based on the premise that the extent of functional differentiation between duplicate genes is reflected in the difference in evolutionary rate because the functional change of genes is often caused by relaxation or intensification of functional constraints. With this idea in mind, we developed a window analysis of protein sequences to identify the protein regions in which the significant rate difference exists. We applied this method to MIKC-type MADS-box proteins that control flower development in plants. We examined 23 pairs of sequences of floral MADS-box proteins from petunia and found that the rate differences for 14 pairs are significant. The significant rate differences were observed mostly in the K domain, which is important for dimerization between MADS-box proteins. These results indicate that our statistical method may be useful for predicting protein regions that are likely to be functionally differentiated. These regions may be chosen for further experimental studies.  相似文献   

11.
Amino acids committed to a particular function correlate tightly along evolution and tend to form clusters in the 3D structure of the protein. Consequently, a protein can be seen as a network of co-evolving clusters of residues. The goal of this work is two-fold: first, we have combined mutual information and structural data to describe the amino acid networks within a protein and their interactions. Second, we have investigated how this information can be used to improve methods of prediction of functional residues by reducing the search space. As a main result, we found that clusters of co-evolving residues related to the catalytic site of an enzyme have distinguishable topological properties in the network. We also observed that these clusters usually evolve independently, which could be related to a fail-safe mechanism. Finally, we discovered a significant enrichment of functional residues (e.g. metal binding, susceptibility to detrimental mutations) in the clusters, which could be the foundation of new prediction tools.  相似文献   

12.
Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify “inter-paralog inversions”, i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline.  相似文献   

13.
Recent advances in functional genomics have helped generate large-scale high-throughput protein interaction data. Such networks, though extremely valuable towards molecular level understanding of cells, do not provide any direct information about the regions (domains) in the proteins that mediate the interaction. Here, we performed co-evolutionary analysis of domains in interacting proteins in order to understand the degree of co-evolution of interacting and non-interacting domains. Using a combination of sequence and structural analysis, we analyzed protein-protein interactions in F1-ATPase, Sec23p/Sec24p, DNA-directed RNA polymerase and nuclear pore complexes, and found that interacting domain pair(s) for a given interaction exhibits higher level of co-evolution than the non-interacting domain pairs. Motivated by this finding, we developed a computational method to test the generality of the observed trend, and to predict large-scale domain-domain interactions. Given a protein-protein interaction, the proposed method predicts the domain pair(s) that is most likely to mediate the protein interaction. We applied this method on the yeast interactome to predict domain-domain interactions, and used known domain-domain interactions found in PDB crystal structures to validate our predictions. Our results show that the prediction accuracy of the proposed method is statistically significant. Comparison of our prediction results with those from two other methods reveals that only a fraction of predictions are shared by all the three methods, indicating that the proposed method can detect known interactions missed by other methods. We believe that the proposed method can be used with other methods to help identify previously unrecognized domain-domain interactions on a genome scale, and could potentially help reduce the search space for identifying interaction sites.  相似文献   

14.

Background  

Protein domains coordinate to perform multifaceted cellular functions, and domain combinations serve as the functional building blocks of the cell. The available methods to identify functional domain combinations are limited in their scope, e.g. to the identification of combinations falling within individual proteins or within specific regions in a translated genome. Further effort is needed to identify groups of domains that span across two or more proteins and are linked by a cooperative function. Such functional domain combinations can be useful for protein annotation.  相似文献   

15.
MOTIVATION: Characterization of a protein family by its distinct sequence domains is crucial for functional annotation and correct classification of newly discovered proteins. Conventional Multiple Sequence Alignment (MSA) based methods find difficulties when faced with heterogeneous groups of proteins. However, even many families of proteins that do share a common domain contain instances of several other domains, without any common underlying linear ordering. Ignoring this modularity may lead to poor or even false classification results. An automated method that can analyze a group of proteins into the sequence domains it contains is therefore highly desirable. RESULTS: We apply a novel method to the problem of protein domain detection. The method takes as input an unaligned group of protein sequences. It segments them and clusters the segments into groups sharing the same underlying statistics. A Variable Memory Markov (VMM) model is built using a Prediction Suffix Tree (PST) data structure for each group of segments. Refinement is achieved by letting the PSTs compete over the segments, and a deterministic annealing framework infers the number of underlying PST models while avoiding many inferior solutions. We show that regions of similar statistics correlate well with protein sequence domains, by matching a unique signature to each domain. This is done in a fully automated manner, and does not require or attempt an MSA. Several representative cases are analyzed. We identify a protein fusion event, refine an HMM superfamily classification into the underlying families the HMM cannot separate, and detect all 12 instances of a short domain in a group of 396 sequences. CONTACT: jill@cs.huji.ac.il; tishby@cs.huji.ac.il.  相似文献   

16.
Of the membrane proteins of known structure, we found that a remarkable 67% of the water soluble domains are structurally similar to water soluble proteins of known structure. Moreover, 41% of known water soluble protein structures share a domain with an already known membrane protein structure. We also found that functional residues are frequently conserved between extramembrane domains of membrane and soluble proteins that share structural similarity. These results suggest membrane and soluble proteins readily exchange domains and their attendant functionalities. The exchanges between membrane and soluble proteins are particularly frequent in eukaryotes, indicating that this is an important mechanism for increasing functional complexity. The high level of structural overlap between the two classes of proteins provides an opportunity to employ the extensive information on soluble proteins to illuminate membrane protein structure and function, for which much less is known. To this end, we employed structure guided sequence alignment to elucidate the functions of membrane proteins in the human genome. Our results bridge the gap of fold space between membrane and water soluble proteins and provide a resource for the prediction of membrane protein function. A database of predicted structural and functional relationships for proteins in the human genome is provided at sbi.postech.ac.kr/emdmp.  相似文献   

17.

Background

The function of a protein can be deciphered with higher accuracy from its structure than from its amino acid sequence. Due to the huge gap in the available protein sequence and structural space, tools that can generate functionally homogeneous clusters using only the sequence information, hold great importance. For this, traditional alignment-based tools work well in most cases and clustering is performed on the basis of sequence similarity. But, in the case of multi-domain proteins, the alignment quality might be poor due to varied lengths of the proteins, domain shuffling or circular permutations. Multi-domain proteins are ubiquitous in nature, hence alignment-free tools, which overcome the shortcomings of alignment-based protein comparison methods, are required. Further, existing tools classify proteins using only domain-level information and hence miss out on the information encoded in the tethered regions or accessory domains. Our method, on the other hand, takes into account the full-length sequence of a protein, consolidating the complete sequence information to understand a given protein better.

Results

Our web-server, CLAP (Classification of Proteins), is one such alignment-free software for automatic classification of protein sequences. It utilizes a pattern-matching algorithm that assigns local matching scores (LMS) to residues that are a part of the matched patterns between two sequences being compared. CLAP works on full-length sequences and does not require prior domain definitions.Pilot studies undertaken previously on protein kinases and immunoglobulins have shown that CLAP yields clusters, which have high functional and domain architectural similarity. Moreover, parsing at a statistically determined cut-off resulted in clusters that corroborated with the sub-family level classification of that particular domain family.

Conclusions

CLAP is a useful protein-clustering tool, independent of domain assignment, domain order, sequence length and domain diversity. Our method can be used for any set of protein sequences, yielding functionally relevant clusters with high domain architectural homogeneity. The CLAP web server is freely available for academic use at http://nslab.mbu.iisc.ernet.in/clap/.  相似文献   

18.
GlobPlot: Exploring protein sequences for globularity and disorder   总被引:2,自引:0,他引:2  
A major challenge in the proteomics and structural genomics era is to predict protein structure and function, including identification of those proteins that are partially or wholly unstructured. Non-globular sequence segments often contain short linear peptide motifs (e.g. SH3-binding sites) which are important for protein function. We present here a new tool for discovery of such unstructured, or disordered regions within proteins. GlobPlot (http://globplot.embl.de) is a web service that allows the user to plot the tendency within the query protein for order/globularity and disorder. We show examples with known proteins where it successfully identifies inter-domain segments containing linear motifs, and also apparently ordered regions that do not contain any recognised domain. GlobPlot may be useful in domain hunting efforts. The plots indicate that instances of known domains may often contain additional N- or C-terminal segments that appear ordered. Thus GlobPlot may be of use in the design of constructs corresponding to globular proteins, as needed for many biochemical studies, particularly structural biology. GlobPlot has a pipeline interface--GlobPipe--for the advanced user to do whole proteome analysis. GlobPlot can also be used as a generic infrastructure package for graphical displaying of any possible propensity.  相似文献   

19.
20.
MOTIVATION: Genomic and proteomic approaches have accumulated a huge amount of data which provide clues to protein function. However, interpreting single omic data for predicting uncharacterized protein functions has been a challenging task, because the data contain a lot of false positives. To overcome this problem, methods for integrating data from various omic approaches are needed for more accurate function prediction. RESULT: In this paper, we have developed a method which extracts functionally similar proteins with high confidence by integrating protein-protein interaction data and domain information. We used this method to analyze publicly available data from Saccharomyces cerevisiae. We identified 1042 functional associations, involving 765 proteins of which 98 (12.8%) had no previously ascribed function. Our method extracts functionally similar protein pairs more accurately than conventional methods, and predicting function for previously uncharacterized proteins can be achieved. Our method can of course be applied to protein-protein interaction data for any species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号