共查询到20条相似文献,搜索用时 15 毫秒
1.
Background
The analysis of correlation in alignments generates a matrix of predicted contacts between positions in the structure and while these can arise for many reasons, the simplest explanation is that the pair of residues are in contact in a three-dimensional structure and are affecting each others selection pressure. To analyse these data, A dynamic programming algorithm was developed for parsing secondary structure interactions in predicted contact maps.Results
The non-local nature of the constraints required an iterated approach (using a “frozen approximation”) but with good starting definitions, a single pass was usually sufficient. The method was shown to be effective when applied to the transmembrane class of protein and error tolerant even when the signal becomes degraded. In the globular class of protein, where the extent of interactions are more limited and more complex, the algorithm still behaved well, classifying most of the important interactions correctly in both a small and a large test case. For the larger protein, this involved examples of the algorithm apportioning parts of a single large secondary structure element between two different interactions.Conclusions
It is expected that the method will be useful as a pre-processor to coarse-grained modelling methods to extend the range of protein tertiary structure prediction to larger proteins or to data that is currently too ’noisy’ to be used by current residue-based methods.2.
Prediction of topological representations of proteins that are geometrically invariants can contribute towards the solution of fundamental open problems in structural genomics like folding. In this paper we focus on coarse grained protein contact maps, a representation that describes the spatial neighborhood relation between secondary structure elements such as helices, beta sheets, and random coils. Our methodology is based on searching the graph space. The search algorithm is guided by an adaptive evaluation function computed by a specialized noncausal recursive connectionist architecture. The neural network is trained using candidate graphs generated during examples of successful searches. Our results demonstrate the viability of the approach for predicting coarse contact maps. 相似文献
3.
4.
Experimentally derived genome-wide protein interaction networks have been useful in the elucidation of functional information that is not evident from examining individual proteins but determination of these networks is complex and time consuming. To address this problem, several computational methods for predicting protein networks in novel genomes have been developed. A recent publication by Date and Marcotte describes the use of phylogenetic profiling for elucidating novel pathways in proteomes that have not been experimentally characterized. This method, in combination with other computational methods for generating protein-interaction networks, might help identify novel functional pathways and enhance functional annotation of individual proteins. 相似文献
5.
Models of infectious diseases are characterized by a phase transition between extinction and persistence. A challenge in contemporary epidemiology is to understand how the geometry of a host’s interaction network influences disease dynamics close to the critical point of such a transition. Here we address this challenge with the help of moment closures. Traditional moment closures, however, do not provide satisfactory predictions close to such critical points. We therefore introduce a new method for incorporating longer-range correlations into existing closures. Our method is technically simple, remains computationally tractable and significantly improves the approximation’s performance. Our extended closures thus provide an innovative tool for quantifying the influence of interaction networks on spatially or socially structured disease dynamics. In particular, we examine the effects of a network’s clustering coefficient, as well as of new geometrical measures, such as a network’s square clustering coefficients. We compare the relative performance of different closures from the literature, with or without our long-range extension. In this way, we demonstrate that the normalized version of the Bethe approximation-extended to incorporate long-range correlations according to our method-is an especially good candidate for studying influences of network structure. Our numerical results highlight the importance of the clustering coefficient and the square clustering coefficient for predicting disease dynamics at low and intermediate values of transmission rate, and demonstrate the significance of path redundancy for disease persistence. 相似文献
6.
Background
Protein sequence alignment is one of the basic tools in bioinformatics. Correct alignments are required for a range of tasks including the derivation of phylogenetic trees and protein structure prediction. Numerous studies have shown that the incorporation of predicted secondary structure information into alignment algorithms improves their performance. Secondary structure predictors have to be trained on a set of somewhat arbitrarily defined states (e.g. helix, strand, coil), and it has been shown that the choice of these states has some effect on alignment quality. However, it is not unlikely that prediction of other structural features also could provide an improvement. In this study we use an unsupervised clustering method, the self-organizing map, to assign sequence profile windows to "structural states" and assess their use in sequence alignment. 相似文献7.
Vassura M Margara L Di Lena P Medri F Fariselli P Casadio R 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2008,5(3):357-367
The prediction of the protein tertiary structure from solely its residue sequence (the so called Protein Folding Problem) is one of the most challenging problems in Structural Bioinformatics. We focus on the protein residue contact map. When this map is assigned it is possible to reconstruct the 3D structure of the protein backbone. The general problem of recovering a set of 3D coordinates consistent with some given contact map is known as a unit-disk-graph realization problem and it has been recently proven to be NP-Hard. In this paper we describe a heuristic method (COMAR) that is able to reconstruct with an unprecedented rate (3-15 seconds) a 3D model that exactly matches the target contact map of a protein. Working with a non-redundant set of 1760 proteins, we find that the scoring efficiency of finding a 3D model very close to the protein native structure depends on the threshold value adopted to compute the protein residue contact map. Contact maps whose threshold values range from 10 to 18 Ångstroms allow reconstructing 3D models that are very similar to the proteins native structure. 相似文献
8.
Vassura M Margara L Di Lena P Medri F Fariselli P Casadio R 《Bioinformatics (Oxford, England)》2008,24(10):1313-1315
Fault Tolerant Contact Map Reconstruction (FT-COMAR) is a heuristic algorithm for the reconstruction of the protein three-dimensional structure from (possibly) incomplete (i.e. containing unknown entries) and noisy contact maps. FT-COMAR runs within minutes, allowing its application to a large-scale number of predictions. AVAILABILITY: http://bioinformatics.cs.unibo.it/FT-COMAR 相似文献
9.
MOTIVATION: The number of protein families has been estimated to be as small as 1000. Recent study shows that the growth in discovery of novel structures that are deposited into PDB and the related rate of increase of SCOP categories are slowing down. This indicates that the protein structure space will be soon covered and thus we may be able to derive most of remaining structures by using the known folding patterns. Present tertiary structure prediction methods behave well when a homologous structure is predicted, but give poorer results when no homologous templates are available. At the same time, some proteins that share twilight-zone sequence identity can form similar folds. Therefore, determination of structural similarity without sequence similarity would be beneficial for prediction of tertiary structures. RESULTS: The proposed PFRES method for automated protein fold classification from low identity (<35%) sequences obtains 66.4% and 68.4% accuracy for two test sets, respectively. PFRES obtains 6.3-12.4% higher accuracy than the existing methods. The prediction accuracy of PFRES is shown to be statistically significantly better than the accuracy of competing methods. Our method adopts a carefully designed, ensemble-based classifier, and a novel, compact and custom-designed feature representation that includes nearly 90% less features than the representation of the most accurate competing method (36 versus 283). The proposed representation combines evolutionary information by using the PSI-BLAST profile-based composition vector and information extracted from the secondary structure predicted with PSI-PRED. AVAILABILITY: The method is freely available from the authors upon request. 相似文献
10.
To improve the prediction accuracy in the regime where template alignment quality is poor, an updated version of TASSER_2.0, namely TASSER_WT, was developed. TASSER_WT incorporates more accurate contact restraints from a new method, COMBCON. COMBCON uses confidence-weighted contacts from PROSPECTOR_3.5, the latest version, PROSPECTOR_4, and a new local structural fragment-based threading algorithm, STITCH, implemented in two variants depending on expected fragment prediction accuracy. TASSER_WT is tested on 622 Hard proteins, the most difficult targets (incorrect alignments and/or templates and incorrect side-chain contact restraints) in a comprehensive benchmark of 2591 nonhomologous, single domain proteins ≤200 residues that cover the PDB at 35% pairwise sequence identity. For 454 of 622 Hard targets, COMBCON provides contact restraints with higher accuracy and number of contacts per residue. As contact coverage with confidence weight ≥3 (Fwt≥3cov) increases, the more improved are TASSER_WT models. When Fwt≥3cov > 1.0 and > 0.4, the average root mean-square deviation of TASSER_WT (TASSER_2.0) models is 4.11 Å (6.72 Å) and 5.03 Å (6.40 Å), respectively. Regarding a structure prediction as successful when a model has a TM-score to the native structure ≥0.4, when Fwt≥3cov > 1.0 and > 0.4, the success rate of TASSER_WT (TASSER_2.0) is 98.8% (76.2%) and 93.7% (81.1%), respectively. 相似文献
11.
In the last years, small-world behavior has been extensively described for proteins, when they are represented by the undirected graph defined by the inter-residue protein contacts. By adopting this representation it was possible to compute the average clustering coefficient (C) and characteristic path length (L) of protein structures, and their values were found to be similar to those of graphs characterized by small-world topology. In this comment, we analyze a large set of non-redundant protein structures (1753) and show that by randomly mimicking the protein collapse, the covalent structure of the protein chain significantly contributes to the small-world behavior of the inter-residue contact graphs. When protein graphs are generated, imposing constraints similar to those induced by the backbone connectivity, their characteristic path lengths and clustering coefficients are indistinguishable from those computed using the real contact maps showing that L and C values cannot be used for 'protein fingerprinting'. Moreover we verified that these results are independent of the selected protein representations, residue composition and protein secondary structures. 相似文献
12.
One major problem with the existing algorithm for the prediction of protein structural classes is low accuracies for proteins from α/β and α+β classes. In this study, three novel features were rationally designed to model the differences between proteins from these two classes. In combination with other rational designed features, an 11-dimensional vector prediction method was proposed. By means of this method, the overall prediction accuracy based on 25PDB dataset was 1.5% higher than the previous best-performing method, MODAS. Furthermore, the prediction accuracy for proteins from α+β class based on 25PDB dataset was 5% higher than the previous best-performing method, SCPRED. The prediction accuracies obtained with the D675 and FC699 datasets were also improved. 相似文献
13.
β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC=0.50, Qtotal=82.1%, sensitivity=75.6%, PPV=68.8% and AUC=0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17-0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. CONCLUSION: The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences. 相似文献
14.
Ashraf Yaseen Mais Nijim Brandon Williams Lei Qian Min Li Jianxin Wang Yaohang Li 《BMC bioinformatics》2016,17(8):281
Background
The fluctuation of atoms around their average positions in protein structures provides important information regarding protein dynamics. This flexibility of protein structures is associated with various biological processes. Predicting flexibility of residues from protein sequences is significant for analyzing the dynamic properties of proteins which will be helpful in predicting their functions.Results
In this paper, an approach of improving the accuracy of protein flexibility prediction is introduced. A neural network method for predicting flexibility in 3 states is implemented. The method incorporates sequence and evolutionary information, context-based scores, predicted secondary structures and solvent accessibility, and amino acid properties. Context-based statistical scores are derived, using the mean-field potentials approach, for describing the different preferences of protein residues in flexibility states taking into consideration their amino acid context.The 7-fold cross validated accuracy reached 61 % when context-based scores and predicted structural states are incorporated in the training process of the flexibility predictor.Conclusions
Incorporating context-based statistical scores with predicted structural states are important features to improve the performance of predicting protein flexibility, as shown by our computational results. Our prediction method is implemented as web service called “FLEXc” and available online at: http://hpcr.cs.odu.edu/flexc.15.
Benchmarking of TASSER_2.0: an improved protein structure prediction algorithm with more accurate predicted contact restraints 下载免费PDF全文
To improve tertiary structure predictions of more difficult targets, the next generation of TASSER, TASSER_2.0, has been developed. TASSER_2.0 incorporates more accurate side-chain contact restraint predictions from a new approach, the composite-sequence method, based on consensus restraints generated by an improved threading algorithm, PROSPECTOR_3.5, which uses computationally evolved and wild-type template sequences as input. TASSER_2.0 was tested on a large-scale, benchmark set of 2591 nonhomologous, single domain proteins ≤200 residues that cover the Protein Data Bank at 35% pairwise sequence identity. Compared with the average fraction of accurately predicted side-chain contacts of 0.37 using PROSPECTOR_3.5 with wild-type template sequences, the average accuracy of the composite-sequence method increases to 0.60. The resulting TASSER_2.0 models are closer to their native structures, with an average root mean-square deviation of 4.99 Å compared to the 5.31 Å result of TASSER. Defining a successful prediction as a model with a root mean-square deviation to native <6.5 Å, the success rate of TASSER_2.0 (TASSER) for Medium targets (targets with good templates/poor alignments) is 74.3% (64.7%) and 40.8% (35.5%) for the Hard targets (incorrect templates/alignments). For Easy targets (good templates/alignments), the success rate slightly increases from 86.3% to 88.4%. 相似文献
16.
Michal Brylinski Seung Yup Lee Hongyi Zhou Jeffrey Skolnick 《Journal of structural biology》2011,173(3):558-569
Exhaustive exploration of molecular interactions at the level of complete proteomes requires efficient and reliable computational approaches to protein function inference. Ligand docking and ranking techniques show considerable promise in their ability to quantify the interactions between proteins and small molecules. Despite the advances in the development of docking approaches and scoring functions, the genome-wide application of many ligand docking/screening algorithms is limited by the quality of the binding sites in theoretical receptor models constructed by protein structure prediction. In this study, we describe a new template-based method for the local refinement of ligand-binding regions in protein models using remotely related templates identified by threading. We designed a Support Vector Regression (SVR) model that selects correct binding site geometries in a large ensemble of multiple receptor conformations. The SVR model employs several scoring functions that impose geometrical restraints on the Cα positions, account for the specific chemical environment within a binding site and optimize the interactions with putative ligands. The SVR score is well correlated with the RMSD from the native structure; in 47% (70%) of the cases, the Pearson’s correlation coefficient is >0.5 (>0.3). When applied to weakly homologous models, the average heavy atom, local RMSD from the native structure of the top-ranked (best of top five) binding site geometries is 3.1 Å (2.9 Å) for roughly half of the targets; this represents a 0.1 (0.3) Å average improvement over the original predicted structure. Focusing on the subset of strongly conserved residues, the average heavy atom RMSD is 2.6 Å (2.3 Å). Furthermore, we estimate the upper bound of template-based binding site refinement using only weakly related proteins to be ~2.6 Å RMSD. This value also corresponds to the plasticity of the ligand-binding regions in distant homologues. The Binding Site Refinement (BSR) approach is available to the scientific community as a web server that can be accessed at http://cssb.biology.gatech.edu/bsr/. 相似文献
17.
Comparative sequence analysis has been used to study specific questions about the structure and function of proteins for many years. Here we propose a knowledge-based framework in which the maximum likelihood rate of evolution is used to quantify the level of constraint on the identity of a site. We demonstrate that site-rate mapping on 3D structures using datasets of rhodopsin-like G-protein receptors and alpha- and beta-tubulins provides an excellent tool for pinpointing the functional features shared between orthologous and paralogous proteins. In addition, functional divergence within protein families can be inferred by examining the differences in the site rates, the differences in the chemical properties of the side chains or amino acid usage between aligned sites. Two novel analytical methods are introduced to characterize rate- independent functional divergence. These are tested using a dataset of two classes of HMG-CoA reductases for which only one class can perform both the forward and reverse reaction. We show that functionally divergent sites occur in a cluster of sites interacting with the catalytic residues and that this information should facilitate the design of experimental strategies to directly test functional properties of residues. 相似文献
18.
19.
Background
This paper presents a simple method to increase the sensitivity of protein family comparisons by incorporating secondary structure (SS) information. We build upon the effective information theory approach towards profile-profile comparison described in [Yona & Levitt 2002]. Our method augments profile columns using PSIPRED secondary structure predictions and assesses statistical similarity using information theoretical principles. 相似文献20.
The helical packing in sperm whale myoglobin has been examined. Using cylindrical co-ordinates based on each helix axis in turn, the overlap of the side-chain atoms of a helix with the surrounding atoms from other parts of the structure was 2.3 Å, but the distribution was not at all uniform and severe overlap occurred in at least one location for each helix. Simple axial translations or rotations of any helix in the native structure are not permitted motions. Translation perpendicular to the helix axis in at least one direction is not restricted by interlocking side-chains.The approach of two helices along the contact normal connecting their axes produces solvent exclusion effects at a distance of about 6 Å from the final position. The solvent-excluded area found in such interaction sites is equivalent to a large hydrophobic contribution to the free energy of association. The six principal sites correspond by themselves to 40% of the total area change in going from the extended sausage model to the native structure. The mean atom-packing densities for these sites and the standard deviations of these values are similar and are equal to that found for the protein as a whole.Helices of close-packed spheres form useful approximations to actual peptide helices. The helix of index number four corresponds closely to an α-helix. The required sphere size corresponds in volume to residues such as leucine or methionine. The predicted packing scheme for such helices corresponds to the three general classes of interactions actually seen.Making use of the geometry implied by the close-packed sphere helix, an algorithm is proposed for picking potentially strong helix-helix interaction sites in peptide chains of known sequence. When combined with preliminary secondary structure predictions, it is suggested that this algorithm might usefully restrict the search for these specific types of contact in the docking portion of a general folding program. 相似文献