首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The identification and annotation of protein domains provides a critical step in the accurate determination of molecular function. Both computational and experimental methods of protein structure determination may be deterred by large multi-domain proteins or flexible linker regions. Knowledge of domains and their boundaries may reduce the experimental cost of protein structure determination by allowing researchers to work on a set of smaller and possibly more successful alternatives. Current domain prediction methods often rely on sequence similarity to conserved domains and as such are poorly suited to detect domain structure in poorly conserved or orphan proteins. We present here a simple computational method to identify protein domain linkers and their boundaries from sequence information alone. Our domain predictor, Armadillo (http://armadillo.blueprint.org), uses any amino acid index to convert a protein sequence to a smoothed numeric profile from which domains and domain boundaries may be predicted. We derived an amino acid index called the domain linker propensity index (DLI) from the amino acid composition of domain linkers using a non-redundant structure dataset. The index indicates that Pro and Gly show a propensity for linker residues while small hydrophobic residues do not. Armadillo predicts domain linker boundaries from Z-score distributions and obtains 35% sensitivity with DLI in a two-domain, single-linker dataset (within +/-20 residues from linker). The combination of DLI and an entropy-based amino acid index increases the overall Armadillo sensitivity to 56% for two domain proteins. Moreover, Armadillo achieves 37% sensitivity for multi-domain proteins, surpassing most other prediction methods. Armadillo provides a simple, but effective method by which prediction of domain boundaries can be obtained with reasonable sensitivity. Armadillo should prove to be a valuable tool for rapidly delineating protein domains in poorly conserved proteins or those with no sequence neighbors. As a first-line predictor, domain meta-predictors could yield improved results with Armadillo predictions.  相似文献   

2.
Prediction of protein interdomain linker regions by a hidden Markov model   总被引:1,自引:0,他引:1  
MOTIVATION: Our aim was to predict protein interdomain linker regions using sequence alone, without requiring known homology. Identifying linker regions will delineate domain boundaries, and can be used to computationally dissect proteins into domains prior to clustering them into families. We developed a hidden Markov model of linker/non-linker sequence regions using a linker index derived from amino acid propensity. We employed an efficient Bayesian estimation of the model using Markov Chain Monte Carlo, Gibbs sampling in particular, to simulate parameters from the posteriors. Our model recognizes sequence data to be continuous rather than categorical, and generates a probabilistic output. RESULTS: We applied our method to a dataset of protein sequences in which domains and interdomain linkers had been delineated using the Pfam-A database. The prediction results are superior to a simpler method that also uses linker index.  相似文献   

3.
4.
Tanaka T  Yokoyama S  Kuroda Y 《Biopolymers》2006,84(2):161-168
Protein dissection into structural domains that can fold in isolation is an important issue in both functional and structural proteomics. Here, we analyzed inter- and intradomain loop sequences (respectively named domain linker and nonlinker loops) and computed a domain linker likelihood score, which was used for developing a domain boundary prediction protocol. The analysis confirmed our previous results indicating that the amino acid composition in terms of glycine, proline, aspartic acid, asparagine, lysine, and histidine significantly differs between linker and nonlinker loops. However, a detailed examination revealed that the amino acid composition bias actually depends on the loop length. Indeed, significant frequency deviations were observed for glycine, proline, and aspartic acid in short linker and nonlinker loops, whereas deviations were observed for aspartic acid, proline, asparagine, and lysine in long linker and nonlinker loops. Finally, we incorporated this loop-length-dependent amino acid composition bias in a simple linker prediction protocol, which predicted linkers with a 40.6% specificity and a 36.1% sensitivity. These figures are 4.4 and 2.4% higher than those obtained with our former prediction protocol that does not incorporate loop-length-dependent characteristics. This result should have practical significance for experimental protein dissection, since the probability of obtaining a stably folding structural domain by randomly dissecting a protein sequence is estimated to be 12.6%.  相似文献   

5.
Protein domain prediction is often the preliminary step in both experimental and computational protein research. Here we present a new method to predict the domain boundaries of a multidomain protein from its amino acid sequence using a fuzzy mean operator. Using the nr-sequence database together with a reference protein set (RPS) containing known domain boundaries, the operator is used to assign a likelihood value for each residue of the query sequence as belonging to a domain boundary. This procedure robustly identifies contiguous boundary regions. For a dataset with a maximum sequence identity of 30%, the average domain prediction accuracy of our method is 97% for one domain proteins and 58% for multidomain proteins. The presented model is capable of using new sequence/structure information without re-parameterization after each RPS update. When tested on a current database using a four year old RPS and on a database that contains different domain definitions than those used to train the models, our method consistently yielded the same accuracy while two other published methods did not. A comparison with other domain prediction methods used in the CASP7 competition indicates that our method performs better than existing sequence-based methods.  相似文献   

6.
The overall function of a multi‐domain protein is determined by the functional and structural interplay of its constituent domains. Traditional sequence alignment‐based methods commonly utilize domain‐level information and provide classification only at the level of domains. Such methods are not capable of taking into account the contributions of other domains in the proteins, and domain‐linker regions and classify multi‐domain proteins. An alignment‐free protein sequence comparison tool, CLAP (CLAssification of Proteins) was previously developed in our laboratory to especially handle multi‐domain protein sequences without a requirement of defining domain boundaries and sequential order of domains. Through this method we aim to achieve a biologically meaningful classification scheme for multi‐domain protein sequences. In this article, CLAP‐based classification has been explored on 5 datasets of multi‐domain proteins and we present detailed analysis for proteins containing (1) Tyrosine phosphatase and (2) SH3 domain. At the domain‐level CLAP‐based classification scheme resulted in a clustering similar to that obtained from an alignment‐based method. CLAP‐based clusters obtained for full‐length datasets were shown to comprise of proteins with similar functions and domain architectures. Our study demonstrates that multi‐domain proteins could be classified effectively by considering full‐length sequences without a requirement of identification of domains in the sequence.  相似文献   

7.
Sulfurtransferases/rhodaneses are a group of enzymes widely distributed in plants, animals, and bacteria that catalyze the transfer of sulfur from a donor molecule to a thiophilic acceptor substrate. Sulfurtransferases (STs) consist of two globular domains of nearly identical size and conformation connected by a short linker sequence. In plant STs this linker sequence is exceptionally longer than in sequences from other species. The Arabidopsis ST1 protein (AJ131404) contains five cysteine residues: one residue is universally conserved in all STs and considered to be catalytically essential; a second one, closely located in the primary sequence, is conserved only in sequences from eukaryotic species. Of the remaining three cysteine residues two are conserved in the so far known plant STs and one is unique to the Arabidopsis ST1. The aim of our study was to investigate the role of the two-domain structure, of the unique plant linker sequence and of each cysteine residue. The N- and C-terminal domains of the Arabidopsis ST1, the full-length protein with a shortened linker sequence and several point-mutated proteins were overexpressed in E. coli, purified and used for enzyme activity measurements. The C-terminal domain itself displayed ST activity which could be increased by adding the separately prepared N-terminal domain. The activity of an ST1 derivative with a shortened linker sequence was reduced by more than 60% of the wild-type activity, probably because of a drastically reduced protein stability. The replacement of each cysteine residue resulted in mutant forms which differed significantly in their stability, in the specific ST activities, and in their kinetic parameters which were determined for 3-mercaptopyruvate as well as thiosulfate as sulfur substrates: mutation of the putative active site cysteine (C332) essentially abolished activity; for C339 a crucial role at least for the turnover of thiosulfate could be identified.  相似文献   

8.
DomCut: prediction of inter-domain linker regions in amino acid sequences   总被引:2,自引:0,他引:2  
DomCut is a program to predict inter-domain linker regions solely by amino acid sequence information. The prediction is made by using linker index deduced from a data set of domain/linker segments. The linker preference profile, which is the averaged linker index along a sequence, can be visualized in the graphical interface.  相似文献   

9.
Protein domains are structural and fundamental functional units of proteins. The information of protein domain boundaries is helpful in understanding the evolution, structures and functions of proteins, and also plays an important role in protein classification. In this paper, we propose a support vector regression-based method to address the problem of protein domain boundary identification based on novel input profiles extracted from AAindex database. As a result, our method achieves an average sensitivity of ∼36.5% and an average specificity of ∼81% for multi-domain protein chains, which is overall better than the performance of published approaches to identify domain boundary. As our method used sequence information alone, our method is simpler and faster.  相似文献   

10.
Georges E 《Biochemistry》2007,46(25):7337-7342
P-Glycoprotein (or ABCB1) has been shown to cause multidrug resistance in tumor cell lines selected with lipophilic anticancer drugs. ABCB1 encodes a duplicated molecule with two hydrophobic and hydrophilic domains linked by a highly charged region of approximately 90 amino acids, the "linker domain" with as yet unknown function(s). In this report, we demonstrate a role for this domain in binding to other cellular proteins. Using overlapping hexapeptides that encode the entire amino acid sequence of the linker domain of human ABCB1, we show a direct and specific binding between sequences in the linker domain and several intracellular proteins. Three different polypeptide sequences [617EKGIYFKLVTM627 (LDS617-627), 657SRSSLIRKRSTRRSVRGSQA676 (LDS657-676), and 693PVSFWRIMKLNLT705 (LDS693-705)] in the linker domain interacted tightly with several proteins with apparent molecular masses of approximately 80, 57, and 30 kDa. Interestingly, only the 57 kDa protein (or P57) interacted with all three different sequences of the linker domain. Purification and partial N-terminal amino acid sequencing of P57 showed that it encodes the N-terminal amino acids of alpha- and beta-tubulins. The identity of the P57 interacting protein as tubulins was further confirmed by Western blotting using monoclonal antibodies to alpha- and beta-tubulin. Taken together, the results of this study provide the first evidence for ABCB1 protein interaction mediated by sequences in the linker domain. These findings are likely to provide further insight into the functions of ABCB1 in normal and drug resistant tumor cells.  相似文献   

11.
According to their main EC (Enzyme Commission) numbers, enzymes are classified into the following 6 main classes: oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases. A new method has been developed to predict the enzymatic attribute of proteins by introducing the functional domain composition to formulate a given protein sequence. The advantage by doing so is that both the sequence-order-related features and the function-related features are naturally incorporated in the predictor. As a demonstration, the jackknife cross-validation test was performed on a dataset that consists of proteins with only less than 20% sequence identity to each other in order to get rid of any homologous bias. The overall success rate thus obtained was 85% in identifying the enzyme family classes (including the identification of nonenzyme protein sequences as well). The success rate is significantly higher than those obtained by the other methods on such a stringent dataset. This indicates that using the functional domain composition to represent protein samples for statistical prediction is indeed very promising, and will become a powerful tool in bioinformatics and proteomics.  相似文献   

12.
Prediction of protein domain with mRMR feature selection and analysis   总被引:2,自引:0,他引:2  
Li BQ  Hu LL  Chen L  Feng KY  Cai YD  Chou KC 《PloS one》2012,7(6):e39308
The domains are the structural and functional units of proteins. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop effective methods for predicting the protein domains according to the sequences information alone, so as to facilitate the structure prediction of proteins and speed up their functional annotation. However, although many efforts have been made in this regard, prediction of protein domains from the sequence information still remains a challenging and elusive problem. Here, a new method was developed by combing the techniques of RF (random forest), mRMR (maximum relevance minimum redundancy), and IFS (incremental feature selection), as well as by incorporating the features of physicochemical and biochemical properties, sequence conservation, residual disorder, secondary structure, and solvent accessibility. The overall success rate achieved by the new method on an independent dataset was around 73%, which was about 28-40% higher than those by the existing method on the same benchmark dataset. Furthermore, it was revealed by an in-depth analysis that the features of evolution, codon diversity, electrostatic charge, and disorder played more important roles than the others in predicting protein domains, quite consistent with experimental observations. It is anticipated that the new method may become a high-throughput tool in annotating protein domains, or may, at the very least, play a complementary role to the existing domain prediction methods, and that the findings about the key features with high impacts to the domain prediction might provide useful insights or clues for further experimental investigations in this area. Finally, it has not escaped our notice that the current approach can also be utilized to study protein signal peptides, B-cell epitopes, HIV protease cleavage sites, among many other important topics in protein science and biomedicine.  相似文献   

13.
We present, to our knowledge, the first quantitative analysis of functional site diversity in homologous domain superfamilies. Different types of functional sites are considered separately. Our results show that most diverse superfamilies are very plastic in terms of the spatial location of their functional sites. This is especially true for protein–protein interfaces. In contrast, we confirm that catalytic sites typically occupy only a very small number of topological locations. Small-ligand binding sites are more diverse than expected, although in a more limited manner than protein–protein interfaces. In spite of the observed diversity, our results also confirm the previously reported preferential location of functional sites. We identify a subset of homologous domain superfamilies where diversity is particularly extreme, and discuss possible reasons for such plasticity, i.e. structural diversity. Our results do not contradict previous reports of preferential co-location of sites among homologues, but rather point at the importance of not ignoring other sites, especially in large and diverse superfamilies. Data on sites exploited by different relatives, within each well annotated domain superfamily, has been made accessible from the CATH website in order to highlight versatile superfamilies or superfamilies with highly preferential sites. This information is valuable for system biology and knowledge of any constraints on protein interactions could help in understanding the dynamic control of networks in which these proteins participate. The novelty of our work lies in the comprehensive nature of the analysis – we have used a significantly larger dataset than previous studies – and the fact that in many superfamilies we show that different parts of the domain surface are exploited by different relatives for ligand/protein interactions, particularly in superfamilies which are diverse in sequence and structure, an observation not previously reported on such a large scale. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.  相似文献   

14.
The crystal structures of the regulated Src and Hck tyrosine kinases show intramolecular interactions between the phosphorylated tail and the SH2 domain as well as between the SH3 domain, the SH2-catalytic domain linker (SH2-CD linker) and the catalytic domain. The relative contribution of these interactions to regulation of activity is poorly understood. Mutational analysis of Src and Lck revealed that interaction of the SH2-CD linker with the SH3 domain is crucial for regulation. Moreover, three sites of interaction of the linker with the catalytic domain, one at the beginning (Trp260) and two at the back of the small lobe, opposite the catalytic cleft (beta2/beta3 loop; alphaC/beta4 loop), impinge on Src activity. Other activating mutations map to the front of the catalytic domain in the loop preceding the alphaC-helix (beta3/alphaC loop). SH2-CD linker mutants are deregulated in mammalian cells but transform fibroblasts weakly, suggesting that the linker may bind cellular components. Interpretation of our results on the basis of the crystal structure of Src favours a model in which the correctly positioned SH2-CD linker exerts an inhibitory function on catalysis of Src family members by facilitating displacement of the alphaC-helix. This study may provide a template for the generation of deregulated versions of other protein kinases.  相似文献   

15.
To facilitate swift structural characterizations, structural genomic/proteomic projects need to divide large multi-domain proteins into structural domains and to determine their structures separately. Thus, the assignment of structural domains based solely on sequence information, especially on the physico-chemical properties of the amino acid sequences, could be very helpful for such projects. In this study, we examined the characteristics of domain linker sequences, which are loop sequences connecting two structural domains. To this end, we prepared a set of 101 non-redundant multi-domain protein sequences with known structures, and performed an analysis of the linker sequences. The analysis revealed that the frequencies of five (Pro, Gly, Asp, Asn, Lys) amino acid residues differed significantly between the linker and non-linker loop sequences. Moreover, we observed a similar deviation for the residue pair frequencies between the two types of loop sequences. Finally, we describe an automated method, based on the above analysis, to detect loops that have high probabilities of being domain linkers in a protein sequence.  相似文献   

16.
To improve our insight into the structure and function of the CFTR R domain, deletion and hybrid constructs in which different parts of the R domain were deleted or replaced by the MDR1 linker domain, and vice versa, were made. Replacement of the linker domain by the R domain did not result in a decrease and replacement of the CFTR R domain by the linker domain did not result in an increase of maturation efficiency, when compared to the respective wild-type proteins. This indicates that the R domain is not responsible for the high degree of degradation observed for CFTR translation products in the ER, but rather the overall structure or sequences located outside the R domain. Replacing the C-terminal part of the R domain (amino acids 780-830) by the MDR1 linker domain resulted in the appearance of PKA-dependent whole cell chloride currents which were not significantly different from wild-type CFTR currents. This might indicate that the PKA sites present in the linker domain are functional and that not the exact sequence of the C-terminal part of the R domain is important, but rather the presence of PKA sites and the length. Moreover, when this hybrid construct was PKC-stimulated, chloride currents were activated. Although these PKC-induced currents were lower than the PKA-induced ones, this again indicates that the linker domain is functional in this hybrid construct. Taken together, these results suggest that the MDR1 linker domain can substitute for part of the regulatory domain of the CFTR protein.  相似文献   

17.
Civera C  Simon B  Stier G  Sattler M  Macias MJ 《Proteins》2005,58(2):354-366
Pleckstrin1 is a major substrate for protein kinase C in platelets and leukocytes, and comprises a central DEP (disheveled, Egl-10, pleckstrin) domain, which is flanked by two PH (pleckstrin homology) domains. DEP domains display a unique alpha/beta fold and have been implicated in membrane binding utilizing different mechanisms. Using multiple sequence alignments and phylogenetic tree reconstructions, we find that 6 subfamilies of the DEP domain exist, of which pleckstrin represents a novel and distinct subfamily. To clarify structural determinants of the DEP fold and to gain further insight into the role of the DEP domain, we determined the three-dimensional structure of the pleckstrin DEP domain using heteronuclear NMR spectroscopy. Pleckstrin DEP shares main structural features with the DEP domains of disheveled and Epac, which belong to different DEP subfamilies. However, the pleckstrin DEP fold is distinct from these structures and contains an additional, short helix alpha4 inserted in the beta4-beta5 loop that exhibits increased backbone mobility as judged by NMR relaxation measurements. Based on sequence conservation, the helix alpha4 may also be present in the DEP domains of regulator of G-protein signaling (RGS) proteins, which are members of the same DEP subfamily. In pleckstrin, the DEP domain is surrounded by two PH domains. Structural analysis and charge complementarity suggest that the DEP domain may interact with the N-terminal PH domain in pleckstrin. Phosphorylation of the PH-DEP linker, which is required for pleckstrin function, could regulate such an intramolecular interaction. This suggests a role of the pleckstrin DEP domain in intramolecular domain interactions, which is distinct from the functions of other DEP domain subfamilies found so far.  相似文献   

18.
One of the major contributors to protein structures is the formation of disulphide bonds between selected pairs of cysteines at oxidized state. Prediction of such disulphide bridges from sequence is challenging given that the possible combination of cysteine pairs as the number of cysteines increases in a protein. Here, we describe a SVM (support vector machine) model for the prediction of cystine connectivity in a protein sequence with and without a priori knowledge on their bonding state. We make use of a new encoding scheme based on physico-chemical properties and statistical features (probability of occurrence of each amino acid residue in different secondary structure states along with PSI-blast profiles). We evaluate our method in SPX (an extended dataset of SP39 (swiss-prot 39) and SP41 (swiss-prot 41) with known disulphide information from PDB) dataset and compare our results with the recursive neural network model described for the same dataset.  相似文献   

19.
I-TevI is a member of the GIY-YIG family of homing endonucleases. It is folded into two structural and functional domains, an N-terminal catalytic domain and a C-terminal DNA-binding domain, separated by a flexible linker. In this study we have used genetic analyses, computational sequence analysis andNMR spectroscopy to define the configuration of theN-terminal domain and its relationship to the flexible linker. The catalytic domain is an alpha/beta structure contained within the first 92 amino acids of the 245-amino acid protein followed by an unstructured linker. Remarkably, this structured domain corresponds precisely to the GIY-YIG module defined by sequence comparisons of 57 proteins including more than 30 newly reported members of the family. Although much of the unstructured linker is not essential for activity, residues 93-116 are required, raising the possibility that this region may adopt an alternate conformation upon DNA binding. Two invariant residues of the GIY-YIG module, Arg27 and Glu75, located in alpha-helices, have properties of catalytic residues. Furthermore, the GIY-YIG sequence elements for which the module is named form part of a three-stranded antiparallel beta-sheet that is important for I-TevI structure and function.  相似文献   

20.
Recent advances in protein engineering have come from creating multi-functional chimeric proteins containing modules from various proteins. These modules are typically joined via an oligopeptide linker, the correct design of which is crucial for the desired function of the chimeric protein. Here we analyse the properties of naturally occurring inter-domain linkers with the aim to design linkers for domain fusion. Two main types of linker were identified; helical and non-helical. Helical linkers are thought to act as rigid spacers separating two domains. Non-helical linkers are rich in prolines, which also leads to structural rigidity and isolation of the linker from the attached domains. This means that both linker types are likely to act as a scaffold to prevent unfavourable interactions between folding domains. Based on these results we have constructed a linker database intended for the rational design of linkers for domain fusion, which can be accessed via the Internet at http://mathbio.nimr.mrc.ac.uk.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号