首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Fujitsuka Y  Chikenji G  Takada S 《Proteins》2006,62(2):381-398
Predicting protein tertiary structures by in silico folding is still very difficult for proteins that have new folds. Here, we developed a coarse-grained energy function, SimFold, for de novo structure prediction, performed a benchmark test of prediction with fragment assembly simulations for 38 test proteins, and proposed consensus prediction with Rosetta. The SimFold energy consists of many terms that take into account solvent-induced effects on the basis of physicochemical consideration. In the benchmark test, SimFold succeeded in predicting native structures within 6.5 A for 12 of 38 proteins; this success rate was the same as that by the publicly available version of Rosetta (ab initio version 1.2) run with default parameters. We investigated which energy terms in SimFold contribute to structure prediction performance, finding that the hydrophobic interaction is the most crucial for the prediction, whereas other sequence-specific terms have weak but positive roles. In the benchmark, well-predicted proteins by SimFold and by Rosetta were not the same for 5 of 12 proteins, which led us to introduce consensus prediction. With combined decoys, we succeeded in prediction for 16 proteins, four more than SimFold or Rosetta separately. For each of 38 proteins, structural ensembles generated by SimFold and by Rosetta were qualitatively compared by mapping sampled structural space onto two dimensions. For proteins of which one of the two methods succeeded and the other failed in prediction, the former had a less scattered ensemble located around the native. For proteins of which both methods succeeded in prediction, often two ensembles were mixed up.  相似文献   

2.
Kumar S 《Bioinformation》2011,6(10):366-369
Filamins are dimeric actin-binding proteins participating in the organization of the actin-based cytoskeleton. Their modular domain organization is made up of an N-terminal actin-binding domain composed of two CH domains followed by flexible rod regions that consist of 24 Ig-like domains. Homology modeling was used to model human filamin using Modeller 9v5. The resulting model assessed by Verify 3D and PROCHECK showed that the final model is reliable. The conformational disorder prediction of human filamin residues were also mapped on the validated structure of human filamin. Prediction of protein disorder in filamin structures will help structural biologists to find suitable targets to be analyzed and for understanding protein function.  相似文献   

3.
Accurate prediction of protein secondary structural content   总被引:2,自引:0,他引:2  
An improved multiple linear regression (MLR) method is proposed to predict a protein's secondary structural content based on its primary sequence. The amino acid composition, the autocorrelation function, and the interaction function of side-chain mass derived from the primary sequence are taken into account. The average absolute errors of prediction over 704 unrelated proteins with the jackknife test are 0.088, 0.081, and 0.059 with standard deviations 0.073, 0.066, and 0.055 for -helix, -sheet, and coil, respectively. That the sum of predicted secondary structure content should be close to 1.0 was introduced as a criterion to evaluate whether the prediction is acceptable. While only the predictions with the sum of predicted secondary structure content between 0.99 and 1.01 are accepted (about 11% of all proteins), the absolute errors are 0.058 for -helix, 0.054 for -sheet, and 0.045 for coil.  相似文献   

4.
Tjalsma H  van Dijl JM 《Proteomics》2005,5(17):4472-4482
The availability of complete bacterial genome sequences allows proteome-wide predictions of exported proteins that are potentially retained in the cytoplasmic membranes of the corresponding organisms. In practice, however, major problems are encountered with the computer-assisted distinction between (Sec-type) signal peptides that direct exported proteins into the growth medium and lipoprotein signal peptides or amino-terminal membrane anchors that cause protein retention in the membrane. In the present studies, which were aimed at improving methods to predict protein retention in the bacterial cytoplasmic membrane, we have compared sets of membrane-attached and extracellular proteins of Bacillus subtilis that were recently identified through proteomics approaches. The results showed that three classes of membrane-attached proteins can be distinguished. Two classes include 43 lipoproteins and 48 proteins with an amino-terminal transmembrane segment, respectively. Remarkably, a third class includes 31 proteins that remain membrane-retained despite the presence of typical Sec-type signal peptides with consensus signal peptidase recognition sites. This unprecedented finding indicates that unknown mechanisms are involved in membrane retention of this class of proteins. A further novelty is a consensus sequence indicative for release of certain lipoproteins from the membrane by proteolytic shaving. Finally, using non-overlapping sets of secreted and membrane-retained proteins, the accuracy of different signal peptide prediction algorithms was assessed. Accuracy for the prediction of protein retention in the membrane was increased to 82% using a majority-vote approach. Our findings provide important leads for future identification of surface proteins from pathogenic bacteria, which are attractive candidate infection markers and potential targets for drugs or vaccines.  相似文献   

5.
Recent studies have emphasized the value of including structural information into the topological analysis of protein networks. Here, we utilized structural information to investigate the role of intrinsic disorder in these networks. Hub proteins tend to be more disordered than other proteins (i.e. the proteome average); however, we find this only true for those with one or two binding interfaces (‘single’‐interface hubs). In contrast, the distribution of disordered residues in multi‐interface hubs is indistinguishable from the overall proteome. Surprisingly, we find that the binding interfaces in single‐interface hubs are highly structured, as is the case for multi‐interface hubs. However, the binding partners of single‐interface hubs tend to have a higher level of disorder than the proteome average, suggesting that their binding promiscuity is related to the disorder of their binding partners. In turn, the higher level of disorder of single‐interface hubs can be partly explained by their tendency to bind to each other in a cascade. A good illustration of this trend can be found in signaling pathways and, more specifically, in kinase cascades. Finally, our findings have implications for the current controversy related to party and date‐hubs.  相似文献   

6.
More than 60 prediction methods for intrinsically disordered proteins (IDPs) have been developed over the years, many of which are accessible on the World Wide Web. Nearly, all of these predictors give balanced accuracies in the ~65%–~80% range. Since predictors are not perfect, further studies are required to uncover the role of amino acid residues in native IDP as compared to predicted IDP regions. In the present work, we make use of sequences of 100% predicted IDP regions, false positive disorder predictions, and experimentally determined IDP regions to distinguish the characteristics of native versus predicted IDP regions. A higher occurrence of asparagine is observed in sequences of native IDP regions but not in sequences of false positive predictions of IDP regions. The occurrences of certain combinations of amino acids at the pentapeptide level provide a distinguishing feature in the IDPs with respect to globular proteins. The distinguishing features presented in this paper provide insights into the sequence fingerprints of amino acid residues in experimentally determined as compared to predicted IDP regions. These observations and additional work along these lines should enable the development of improvements in the accuracy of disorder prediction algorithm.  相似文献   

7.

Background  

Structural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio.  相似文献   

8.
9.
Recent research in the protein intrinsic disorder was stimulated by the availability of accurate computational predictors. However, most of these methods are relatively slow, especially considering proteome-scale applications, and were shown to produce relatively large errors when estimating disorder at the protein- (in contrast to residue-) level, which is defined by the fraction/content of disordered residues. To this end, we propose a novel support vector Regression-based Accurate Predictor of Intrinsic Disorder (RAPID). Key advantages of RAPID are speed (prediction of an average-size eukaryotic proteome takes < 1 h on a modern desktop computer); sophisticated design (multiple, complementary information sources that are aggregated over an input chain are combined using feature selection); and high-quality and robust predictive performance. Empirical tests on two diverse benchmark datasets reveal that RAPID's predictive performance compares favorably to a comprehensive set of state-of-the-art disorder and disorder content predictors. Drawing on high speed and good predictive quality, RAPID was used to perform large-scale characterization of disorder in 200 + fully sequenced eukaryotic proteomes. Our analysis reveals interesting relations of disorder with structural coverage and chain length, and unusual distribution of fully disordered chains. We also performed a comprehensive (using 56000+ annotated chains, which doubles the scope of previous studies) investigation of cellular functions and localizations that are enriched in the disorder in the human proteome. RAPID, which allows for batch (proteome-wide) predictions, is available as a web server at http://biomine.ece.ualberta.ca/RAPID/.  相似文献   

10.
Intrinsic disorder in cell-signaling and cancer-associated proteins   总被引:3,自引:0,他引:3  
The number of intrinsically disordered proteins known to be involved in cell-signaling and regulation is growing rapidly. To test for a generalized involvement of intrinsic disorder in signaling and cancer, we applied a neural network predictor of natural disordered regions (PONDR VL-XT) to four protein datasets: human cancer-associated proteins (HCAP), signaling proteins (AfCS), eukaryotic proteins from SWISS-PROT (EU_SW) and non-homologous protein segments with well-defined (ordered) 3D structure (O_PDB_S25). PONDR VL-XT predicts >or=30 consecutive disordered residues for 79(+/-5)%, 66(+/-6)%, 47(+/-4)% and 13(+/-4)% of the proteins from HCAP, AfCS, EU_SW, and O_PDB_S25, respectively, indicating significantly more intrinsic disorder in cancer-associated and signaling proteins as compared to the two control sets. The disorder analysis was extended to 11 additional functionally diverse categories of human proteins from SWISS-PROT. The proteins involved in metabolism, biosynthesis, and degradation together with kinases, inhibitors, transport, G-protein coupled receptors, and membrane proteins are predicted to have at least twofold less disorder than regulatory, cancer-associated and cytoskeletal proteins. In contrast to 44.5% of the proteins from representative non-membrane categories, just 17.3% of the cancer-associated proteins had sequence alignments with structures in the Protein Data Bank covering at least 75% of their lengths. This relative lack of structural information correlated with the greater amount of predicted disorder in the HCAP dataset. A comparison of disorder predictions with the experimental structural data for a subset of the HCAP proteins indicated good agreement between prediction and observation. Our data suggest that intrinsically unstructured proteins play key roles in cell-signaling, regulation and cancer, where coupled folding and binding is a common mechanism.  相似文献   

11.
To understand the folding behavior of proteins is an important and challenging problem in modern molecular biology. In the present investigation, a large number of features representing protein sequences were developed based on sequence autocorrelation weighted by properties of amino acid residues. Genetic algorithm (GA) combined with multiple linear regression (MLR) was employed to select significant features related to protein folding rates, and to build global predictive model. Moreover, local lazy regression (LLR) method was also used to predict the protein folding rates. The obtained results indicated that LLR performed much better than the global MLR model. The important properties of amino acid residues affecting protein folding rates were also analyzed. The results of this study will be helpful to understand the mechanism of protein folding. Our results also demonstrate that the features of amino acid sequence autocorrelation is effective in representing the relationship between protein sequence and folding rates, and the local method is a powerful tool to predict the protein folding rates.  相似文献   

12.
At least a quarter of all genes in most genomes contain putative transmembrane (TM) helices, and helical membrane protein interactions are a major component of the overall cellular interactome. However, current experimental techniques for large-scale detection of protein-protein interactions are biased against membrane proteins. Here, we define protein-protein interaction broadly as co-complexation, and develop a weighted-voting procedure to predict interactions among yeast helical membrane proteins by optimally combining evidence based on diverse genome-wide information such as sequence, function, localization, abundance, regulation, and phenotype. We use logistic regression to simultaneously optimize the weights of all evidence sources for best discrimination based on a set of known helical membrane protein interactions. The resulting integrated classifier not only significantly outperforms classifiers based on any single genomic feature, but also does better than a benchmark Na?ve Bayes classifier (using a simplifying assumption of conditional independence among features). Finally, we apply the optimized classifier genome-wide, and construct a comprehensive map of predicted helical membrane protein interactome in yeast. This can serve as a guide for prioritizing further experimental validation efforts.  相似文献   

13.
蛋白质结构预测的理论方法及阶段   总被引:2,自引:0,他引:2  
孙侠  殷志祥 《生物学杂志》2007,24(1):16-17,15
一直以来,蛋白质结构预测都是人们研究的焦点,综述了蛋白质结构预测的几种理论方法和不同阶段。  相似文献   

14.
To maximise the assignment of function of the proteins encoded by a genome and to aid the search for novel drug targets, there is an emerging need for sensitive methods of predicting protein function on a genome-wide basis. GeneAtlas is an automated, high-throughput pipeline for the prediction of protein structure and function using sequence similarity detection, homology modelling and fold recognition methods. GeneAtlas is described in detail here. To test GeneAtlas, a 'virtual' genome was used, a subset of PDB structures from the SCOP database, in which the functional relationships are known. GeneAtlas detects additional relationships by building 3D models in comparison with the sequence searching method PSI-BLAST. Functionally related proteins with sequence identity below the twilight zone can be recognised correctly.  相似文献   

15.
Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets.  相似文献   

16.

Background

Analyzing the amino acid sequence of an intrinsically disordered protein (IDP) in an evolutionary context can yield novel insights on the functional role of disordered regions and sequence element(s). However, in the case of many IDPs, the lack of evolutionary conservation of the primary sequence can hamper the study of functionality, because the conservation of their disorder profile and ensuing function(s) may not appear in a traditional analysis of the evolutionary history of the protein.

Results

Here we present DisCons (Disorder Conservation), a novel pipelined tool that combines the quantification of sequence- and disorder conservation to classify disordered residue positions. According to this scheme, the most interesting categories (for functional purposes) are constrained disordered residues and flexible disordered residues. The former residues show conservation of both the sequence and the property of disorder and are associated mainly with specific binding functionalities (e.g., short, linear motifs, SLiMs), whereas the latter class correspond to segments where disorder as a feature is important for function as opposed to the identity of the underlying sequence (e.g., entropic chains and linkers). DisCons therefore helps with elucidating the function(s) arising from the disordered state by analyzing individual proteins as well as large-scale proteomics datasets.

Conclusions

DisCons is an openly accessible sequence analysis tool that identifies and highlights structurally disordered segments of proteins where the conformational flexibility is conserved across homologs, and therefore potentially functional. The tool is freely available both as a web application and as stand-alone source code hosted at http://pedb.vib.be/discons.  相似文献   

17.
Summary An equivalence between restricted best linear unbiased prediction (and thus restricted selection index) and a particular example of a selection model is presented. Specifically, the equivalence is between restricted selection and a model of selection on the residuals of the general mixed linear model. This result illustrates that restricted selection acts by nonrandomly sampling those genes that act pleiotropically in multiple trait genetic models. An expression for a mixed linear model which includes restrictions is also presented.  相似文献   

18.
Yang  C.M.  Yang  J.S.  Yang  C.K.  Chou  C.H. 《Photosynthetica》2000,37(4):499-508
We applied the grey system theory to evaluation of chlorophyll (Chl) degradation in Chamaecyparis Sieb. & Zucc. var. formosana (Hayata) Rehder needle-leaf in the Yuanyang Lake Nature Preserve of northern Taiwan. Pigment analysis was finished within 12 h after collecting the samples. Four grey prediction models for the degradation of Chl a, Chl b, and for the change of Chl a/b ratio and water content were established and compared with the results of linear and exponential regression analysis. The residual error and accuracy range show that the grey prediction process is much better than regression analysis. The degradation of Chl a and b contains two phases, one being fast and the other slow.  相似文献   

19.
A secondary structure has been predicted for the C termini of the fibrinogen β and γ chains from an aligned set of homologous protein sequences using a transparent method that extracts conformational information from patters of variation and conservation, parsing strings, and patterns of amphiphilicity. The structure is modeled to form two domains, the first having a core parallel sheet flanked on one side by at least two helices and on the other by an antiparallel amphiphilic sheet, with an additional helix connecting the two sheets. The second domain is built entirely from β strands. © 1997 Wiley-Liss, Inc.  相似文献   

20.
The abundance of computer software for different types of prediction in DNA and protein sequence analyses raises the problem of adequate ranking of prediction program quality. A single measure of success of predictor software, which adequately ranks the predictors, does not exist. A typical example of such an incomplete measure is the so-called correlation coefficient. This paper provides an overview and short analysis of several different measures of prediction quality. Frequently, some of these measures give results contradictory to each other even when they relate to the same prediction scores.This may lead to confusion. In order to overcome some of the problems, a few new measures are proposed including some variants of a 'generalised distance from the ideal predictor score'; these are based on topological properties, rather than on statistics. In order to provide a sort of a balanced ranking, the averaged score measure (ASM) is introduced.The ASM provides a possibility for the selection of the predictor that probably has the best overall performance.The method presented in the paper applies to the ranking problem of any prediction software whose results can be properly represented in a true positive-false positive framework, thus providing a natural set-up for linear biological sequence analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号