首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
Proteins with long disordered regions (LDRs), defined as having 30 or more consecutive disordered residues, are abundant in eukaryotes, and these regions are recognized as a distinct class of biologically functional domains. LDRs facilitate various cellular functions and are important for target selection in structural genomics. Motivated by the lack of methods that directly predict proteins with LDRs, we designed Super‐fast predictor of proteins with Long Intrinsically DisordERed regions (SLIDER). SLIDER utilizes logistic regression that takes an empirically chosen set of numerical features, which consider selected physicochemical properties of amino acids, sequence complexity, and amino acid composition, as its inputs. Empirical tests show that SLIDER offers competitive predictive performance combined with low computational cost. It outperforms, by at least a modest margin, a comprehensive set of modern disorder predictors (that can indirectly predict LDRs) and is 16 times faster compared to the best currently available disorder predictor. Utilizing our time‐efficient predictor, we characterized abundance and functional roles of proteins with LDRs over 110 eukaryotic proteomes. Similar to related studies, we found that eukaryotes have many (on average 30.3%) proteins with LDRs with majority of proteomes having between 25 and 40%, where higher abundance is characteristic to proteomes that have larger proteins. Our first‐of‐its‐kind large‐scale functional analysis shows that these proteins are enriched in a number of cellular functions and processes including certain binding events, regulation of catalytic activities, cellular component organization, biogenesis, biological regulation, and some metabolic and developmental processes. A webserver that implements SLIDER is available at http://biomine.ece.ualberta.ca/SLIDER/ .Proteins 2014; 82:145–158. © 2013 Wiley Periodicals, Inc.  相似文献   

Seeger MA  Zhang Y  Rice SE 《Proteins》2012,80(10):2437-2446
Kinesin motor proteins transport a wide variety of molecular cargoes in a spatially and temporally regulated manner. Kinesin motor domains, which hydrolyze ATP to produce a directed mechanical force along a microtubule, are well conserved throughout the entire superfamily. Outside of the motor domains, kinesin sequences diverge along with their transport functions. The nonmotor regions, particularly the tails, respond to a wide variety of structural and molecular cues that enable kinesins to carry specific cargoes in response to particular cellular signals. Here, we demonstrate that intrinsic disorder is a common structural feature of kinesins. A bioinformatics survey of the full‐length sequences of all 43 human kinesins predicts that significant regions of intrinsically disordered residues are present in all kinesins. These regions are concentrated in the nonmotor domains, particularly in the tails and near sites for ligand binding or post‐translational modifications. In order to experimentally verify these predictions, we expressed and purified the tail domains of kinesins representing three different families (Kif5B, Kif10, and KifC3). Circular dichroism and NMR spectroscopy experiments demonstrate that the isolated tails are disordered in vitro, yet they retain their functional microtubule‐binding activity. On the basis of these results, we propose that intrinsic disorder is a common structural feature that confers functional specificity to kinesins. Proteins 2012;. © 2012 Wiley Periodicals, Inc.  相似文献   

Lise S  Jones DT 《Proteins》2005,58(1):144-150
The relationship between amino acid sequence and intrinsic disorder in proteins is investigated. Two databases, one of disordered proteins and the other of globular proteins, are analyzed and compared in order to extract simple sequence patterns of a few amino acids or amino acid properties that characterize disordered segments. It is found that a number of reliable, nonrandom associations exists. In particular, two types of patterns appear to be recurrent: a proline-rich pattern and a (positively or negatively) charged pattern. These results indicate that local sequence information can determine disordered regions in proteins. The derived patterns provide some insights into the physical reasons for disordered structures. They should also be helpful in improving currently available prediction methods.  相似文献   

A growing number of proteins are being identified that are biologically active though intrinsically disordered, in sharp contrast with the classic notion that proteins require a well-defined globular structure in order to be functional. At the same time recent work showed that aggregation and amyloidosis are initiated in amino acid sequences that have specific physico-chemical properties in terms of secondary structure propensities, hydrophobicity and charge. In intrinsically disordered proteins (IDPs) such sequences would be almost exclusively solvent-exposed and therefore cause serious solubility problems. Further, some IDPs such as the human prion protein, synuclein and Tau protein are related to major protein conformational diseases. However, this scenario contrasts with the large number of unstructured proteins identified, especially in higher eukaryotes, and the fact that the solubility of these proteins is often particularly good. We have used the algorithm TANGO to compare the beta aggregation tendency of a set of globular proteins derived from SCOP and a set of 296 experimentally verified, non-redundant IDPs but also with a set of IDPs predicted by the algorithms DisEMBL and GlobPlot. Our analysis shows that the beta-aggregation propensity of all-alpha, all-beta and mixed alpha/beta globular proteins as well as membrane-associated proteins is fairly similar. This illustrates firstly that globular structures possess an appreciable amount of structural frustration and secondly that beta-aggregation is not determined by hydrophobicity and beta-sheet propensity alone. We also show that globular proteins contain almost three times as much aggregation nucleating regions as IDPs and that the formation of highly structured globular proteins comes at the cost of a higher beta-aggregation propensity because both structure and aggregation obey very similar physico-chemical constraints. Finally, we discuss the fact that although IDPs have a much lower aggregation propensity than globular proteins, this does not necessarily mean that they have a lower potential for amyloidosis.  相似文献   

The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (+/-20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed.  相似文献   

Lee J 《Proteins》2006,65(2):453-462
Many of the recent secondary structure prediction methods incorporate the idea of fuzzy set theory, where instead of assigning a definite secondary structure to a query residue, probability for the residue being in each of the conformational states is estimated. Moreover, continuous assignment of conformational states to the experimentally observed protein structures can be performed in order to reflect inherent flexibility. Although various measures have been developed for evaluating performances of secondary structure prediction methods, they depend only on the most probable secondary structures. They do not assess the accuracy of the probabilities produced by fuzzy prediction methods, and they cannot incorporate information contained in continuous assignments of conformational states to observed structures. Three important measures for evaluating performance of a secondary structure prediction algorithm, Q score, Segment OVerlap (SOV) measure, and the k-state correlation coefficient (Corr), are deformed into fuzzy measures F score, Fuzzy OVerlap (FOV) measure, and the fuzzy correlation coefficient (Forr), so that the new measures not only assess probabilistic outputs of fuzzy prediction methods, but also incorporate information from continuous assignments of secondary structure. As an example of application, prediction results of four fuzzy secondary structure prediction methods, PSIPRED, PROFking, SABLE, and PREDICT, are assessed using the new fuzzy measures.  相似文献   

Bartlett GJ  Taylor WR 《Proteins》2008,71(2):950-959
Distinguishing native from non-native folds remains a challenging problem for protein structure prediction. We describe a method, SCA-distance scoring, based on results from statistical coupling analysis which discriminates between native and non-native folds produced by a de novo protein structure prediction method for four out of five test proteins. The method is particularly good at discriminating non-native folds which are close in RMSD to the true fold but contain a change in an internal structural element. SCA-distance scoring is a useful addition to the tools available for distinguishing native from non-native folds in protein structure prediction.  相似文献   

The location of certain amino acid sequences like repeats along the polypeptide chain is very important in the context of forming the overall shape of the protein molecule which in fact determines its function. In gram‐positive bacteria, fibronectin‐binding protein (FnBP) is one such repeat containing protein, and it is a cell wall‐attached protein responsible for various acute infections in human. Several studies on sequence, structure, and function of fibronectin‐binding regions of FnBPs were reported; however, no detailed study was carried out on the full‐length protein sequence. In the present study, we have made a thorough sequence and structure analysis on FnBP_A of Staphylococcus aureus and explored the presence of dual ligand‐binding ability of fibrinogen (fg)‐binding region and its molecular recognition processes. Multiple sequence alignment and protein‐protein docking analysis reveal the regions which are likely involved in dual ligand binding. Further analysis of docking of FnBP_A fg‐binding region and fn N‐terminal modules suggests that if the latter binds to the fg‐binding region of FnBP_A, it would inhibit the subsequent binding of fg because of steric hindrance. The sequence analysis further suggests that the abundance of disorder promoting residue glutamic acid and dual personality (both order/disorder promoting) residue threonine in tandem repeats of FnBP_A and B proteins possibly would help the molecule to undergo a conformational change while binding with fn by β‐zipper mechanism. The segment‐based power spectral analysis was carried out which helps to understand the distribution of hydrophobic residues along the sequence particularly in intrinsic disordered tandem repeats. The results presented here will help to understand the role of internal repeats and intrinsic disorder in the molecular recognition process of a pathogenic cell surface protein.  相似文献   

To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins.  相似文献   

Many biologically active proteins, which are usually called intrinsically disordered or natively unfolded proteins, lack stable tertiary and/or secondary structure under physiological conditions in vitro. Their functions complement the functional repertoire of ordered proteins, with intrinsically disordered proteins (IDPs) often being involved in regulation, signaling and control. Their amino acid sequences and compositions are very different from those of ordered proteins, making reliable identification of IDPs possible at the proteome level. IDPs are highly abundant in various human diseases, including neurodegeneration and other protein dysfunction maladies and, therefore, represent attractive novel drug targets. Some of the aspects of IDPs, as well as their roles in neurodegeneration and protein dysfunction diseases, are discussed in this article, together with the peculiarities of IDPs as potential drug targets.  相似文献   

Wintjens R  Gilis D  Rooman M 《Proteins》2008,70(4):1564-1577
Fe- and Mn-containing superoxide dismutase (sod) enzymes are closely related and similar in both amino acid sequence and structure, but differ in their mode of oligomerization and in their specificity for the Fe or Mn cofactor. The goal of the present work is to identify and analyze the sequence and structure characteristics that ensure the cofactor specificities and the oligomerization modes. For that purpose, 374 sod sequences and 17 sod crystal structures were collected and aligned. These alignments were searched for residues and inter-residue interactions that are conserved within the whole sod family, or alternatively, that are specific to a given sod subfamily sharing common characteristics. This led us to define key residues and inter-residue interaction fingerprints in each subfamily. The comparison of these fingerprints allows, on a rational basis, the design of mutants likely to modulate the activity and/or specificity of the target sod, in good agreement with the available experimental results on known mutants. The key residues and interaction fingerprints are furthermore used to predict if a novel sequence corresponds to a sod enzyme, and if so, what type of sod it is. The predictions of this fingerprint method reach much higher scores and present much more discriminative power than the commonly used method that uses pairwise sequence comparisons.  相似文献   

The AAA protein family, a recently recognized group of Walker-type ATPases, has been subjected to an extensive sequence analysis. Multiple sequence alignments revealed the existence of a region of sequence similarity, the so-called AAA cassette. The borders of this cassette were localized and within it, three boxes of a high degree of conservation were identified. Two of these boxes could be assigned to substantial parts of the ATP binding site (namely, to Walker motifs A and B); the third may be a portion of the catalytic center. Phylogenetic trees were calculated to obtain insights into the evolutionary history of the family. Subfamilies with varying degrees of intra-relatedness could be discriminated; these relationships are also supported by analysis of sequences outside the canonical AAA boxes: within the cassette are regions that are strongly conserved within each subfamily, whereas little or even no similarity between different subfamilies can be observed. These regions are well suited to define fingerprints for subfamilies. A secondary structure prediction utilizing all available sequence information was performed and the result was fitted to the general 3D structure of a Walker A/GTPase. The agreement was unexpectedly high and strongly supports the conclusion that the AAA family belongs to the Walker superfamily of A/GTPases.  相似文献   

Using site-directed spin-labeling EPR spectroscopy, we mapped the region of the intrinsically disordered C-terminal domain of measles virus nucleoprotein (N(TAIL)) that undergoes induced folding. In addition to four spin-labeled N(TAIL) variants (S407C, S488C, L496C, and V517C) (Morin et al. (2006), J Phys Chem 110: 20596-20608), 10 new single-site cysteine variants were designed, purified from E. coli, and spin-labeled. These 14 spin-labeled variants enabled us to map in detail the gain of rigidity of N(TAIL) in the presence of either the secondary structure stabilizer 2,2,2-trifluoroethanol or the C-terminal domain X (XD) of the viral phosphoprotein. Different regions of N(TAIL) were shown to contribute to a different extent to the binding to XD, while the mobility of the spin labels grafted at positions 407 and 460 was unaffected upon addition of XD; that of the spin labels grafted within the 488-502 and the 505-522 regions was severely and moderately reduced, respectively. Furthermore, EPR experiments in the presence of 30% sucrose allowed us to precisely map to residues 488-502, the N(TAIL) region undergoing alpha-helical folding. The mobility of the 488-502 region was found to be restrained even in the absence of the partner, a behavior that could be accounted for by the existence of a transiently populated folded state. Finally, we show that the restrained motion of the 505-522 region upon binding to XD is due to the alpha-helical transition occurring within the 488-502 region and not to a direct interaction with XD.  相似文献   

克氏肺炎杆菌NiFe-氢酶基因的克隆与序列分析   总被引:1,自引:0,他引:1  
采用CLUSTAL-W软件对Swiss-Prot蛋白数据库中已报道的NiFe-氢酶大亚基氨基酸序列进行比对分析,找到保守区并根据此设计兼并引物。利用其中一对引物进行PCR得到一条大小约为1000bp的DNA序列,并根据此序列设计引物进行反向PCR得到整个NiFe-氢酶的序列。再利用生物信息学软件对此氢酶的序列进行二、三级结构预测及大小亚基的对接(docking)。结果表明克氏肺炎杆菌的NiFe-氢酶属于一类膜结合放氢酶(Ech氢酶)。  相似文献   

The amino acid sequences of soluble, ordered proteins with stable structures have evolved due to biological and physical requirements, thus distinguishing them from random sequences. Previous analyses have focused on extracting the features that frequently appear in protein substructures, such as α‐helix and β‐sheet, but the universal features of protein sequences have not been addressed. To clarify the differences between native protein sequences and random sequences, we analyzed 7368 soluble, ordered protein sequences, by inspecting the observed and expected occurrences of 400 amino acid pairs in local proximity, up to 10 residues along the sequence in comparison with their expected occurrence in random sequence. We found the trend that the hydrophobic residue pairs and the polar residue pairs are significantly decreased, whereas the pairs between a hydrophobic residue and a polar residue are increased. This trend was universally observed regardless of the secondary structure content but was not observed in protein sequences that include intrinsically disordered regions, indicating that it can be a general rule of protein foldability. The possible benefits of this rule are discussed from the viewpoints of protein aggregation and disorder, which are both caused by low‐complexity regions of hydrophobic or polar residues.  相似文献   

副干酪乳杆菌响应调节蛋白基因克隆与序列分析   总被引:1,自引:0,他引:1  
根据HPK10亚家族氨基酸序列相似的特点,设计2对引物获得Lactobacillus paracasei HD1.7的HPK10亚家族保守区域序列,此片段长为330bp。根据此序列进行比对分析并设计简并引物,扩增获得整个响应调节蛋白(RR)序列,长为807bp。利用生物信息学软件对此序列进行了同源性分析、氨基酸组成分析、疏水性分析、磷酸化位点预测、CDS分析及二、三级结构预测。结果表明,该克隆片段为L.paracasei HD1.7的响应调节蛋白。  相似文献   

The conformational parametersP k for each amino acid species (j=1–20) of sequential peptides in proteins are presented as the product ofP i,k , wherei is the number of the sequential residues in thekth conformational state (k=-helix,-sheet,-turn, or unordered structure). Since the average parameter for ann-residue segment is related to the average probability of finding the segment in the kth state, it becomes a geometric mean of (P k )av=(P i,k ) 1/n with amino acid residuei increasing from 1 ton. We then used ln(Pk)av to convert a multiplicative process to a summation, i.e., ln(P k ) av =(1/n)P i,k (i=1 ton) for ease of operation. However, this is unlike the popular Chou-Fasman algorithm, which has the flaw of using the arithmetic mean for relative probabilities. The Chou-Fasman algorithm happens to be close to our calculations in many cases mainly because the difference between theirP k and our InP k is nearly constant for about one-half of the 20 amino acids. When stronger conformation formers and breakers exist, the difference become larger and the prediction at the N- and C-terminal-helix or-sheet could differ. If the average conformational parameters of the overlapping segments of any two states are too close for a unique solution, our calculations could lead to a different prediction.  相似文献   

S. Rackovsky 《Proteins》2013,81(10):1681-1685
Delineation of the relationship between sequence and structure in proteins has proven elusive. Most studies of this problem use alignment methods and other approaches based on the characteristics of individual residues. It is demonstrated herein that the sequence‐structure relationship is determined in significant part by global characteristics of sequence organization. Information encoded in complete sequences is required to distinguish proteins in different architectural groups. It is found that the statistically significant differences between sequences encoding different architectures are encoded in a surprisingly small set of low‐wave‐number sequence periodicities. It would therefore appear that unexpected simplicity in an appropriately defined Fourier space may be an inherent characteristic of the sequences of folded proteins. Proteins 2013; 81:1681–1685. © 2013 Wiley Periodicals, Inc.  相似文献   

牙鲆碱性磷酸酶cDNA序列分析与蛋白质高级结构预测   总被引:1,自引:0,他引:1  
为研究碱性磷酸酶(EC; alkaline phosphatase,ALP)在牙鲆(Paralichthys Olivaceus)发育和变态中的作用,采用RACE的方法克隆了牙鲆ALP基因cDNA全长,通过生物信息学分析了核苷酸序列并进行蛋白结构预测. 结果表明,牙鲆ALP cDNA全长为1 811bp,能编码476个氨基酸的蛋白质,分子量为52 293.1,等电点为7.67. 编码区核苷酸GC含量在ALP同源基因中差异比较大,脊椎动物明显高于非脊椎动物和细菌. 分子系统分析显示,牙鲆ALP和青黑斑河豚(Tetraodon nigroviridis)、斑马鱼(Danio rerio)的组织非特异性ALP有较高的同源性,分子进化树和物种进化树是一致的. 在蛋白序列中的一些重要的功能位点,包括金属离子结合位点、N糖基化位点和丝氨酸磷酸化位点等表现了较高的保守性. 牙鲆ALP和人胎盘ALP(PALP)在蛋白序列上有43%的相似性,其3D结构非常接近.通过氨基酸空间位置比较发现,牙鲆ALP中141和203位半胱氨酸对应于人PALP的121和183位半胱氨酸,推测能形成一个二硫键. 在两者酶活性中心,3个金属离子结合的氨基酸残基非常保守,Zn离子周围的9个氨基酸中有2个不同;Mg离子周围的7个氨基酸也只有2个不同,包括一对类似的丝氨酸155和苏氨酸175.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号