共查询到20条相似文献,搜索用时 31 毫秒
1.
Proteins fold through a two-state (TS), with no visible intermediates, or a multi-state (MS), via at least one intermediate,
process. We analyze sequence-derived factors that determine folding types by introducing a novel sequence-based folding type
predictor called FOKIT. This method implements a logistic regression model with six input features which hybridize information
concerning amino acid composition and predicted secondary structure and solvent accessibility. FOKIT provides predictions
with average Matthews correlation coefficient (MCC) between 0.58 and 0.91 measured using out-of-sample tests on four benchmark
datasets. These results are shown to be competitive or better than results of four modern predictors. We also show that FOKIT
outperforms these methods when predicting chains that share low similarity with the chains used to build the model, which
is an important advantage given the limited number of annotated chains. We demonstrate that inclusion of solvent accessibility
helps in discrimination of the folding kinetic types and that three of the features constitute statistically significant markers
that differentiate TS and MS folders. We found that the increased content of exposed Trp and buried Leu are indicative of
the MS folding, which implies that the exposure/burial of certain hydrophobic residues may play important role in the formation
of the folding intermediates. Our conclusions are supported by two case studies. 相似文献
2.
SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures 总被引:4,自引:0,他引:4
MOTIVATION: Multiple sequence alignment is an essential part of bioinformatics tools for a genome-scale study of genes and their evolution relations. However, making an accurate alignment between remote homologs is challenging. Here, we develop a method, called SPEM, that aligns multiple sequences using pre-processed sequence profiles and predicted secondary structures for pairwise alignment, consistency-based scoring for refinement of the pairwise alignment and a progressive algorithm for final multiple alignment. RESULTS: The alignment accuracy of SPEM is compared with those of established methods such as ClustalW, T-Coffee, MUSCLE, ProbCons and PRALINE(PSI) in easy (homologs) and hard (remote homologs) benchmarks. Results indicate that the average sum of pairwise alignment scores given by SPEM are 7-15% higher than those of the methods compared in aligning remote homologs (sequence identity <30%). Its accuracy for aligning homologs (sequence identity >30%) is statistically indistinguishable from those of the state-of-the-art techniques such as ProbCons or MUSCLE 6.0. AVAILABILITY: The SPEM server and its executables are available on http://theory.med.buffalo.edu. 相似文献
3.
Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information 总被引:1,自引:0,他引:1
Gianluca Pollastri Alberto JM Martin Catherine Mooney Alessandro Vullo 《BMC bioinformatics》2007,8(1):201
Background
Structural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio. 相似文献4.
Application of multiple sequence alignment profiles to improve protein secondary structure prediction 总被引:28,自引:0,他引:28
The effect of training a neural network secondary structure prediction algorithm with different types of multiple sequence alignment profiles derived from the same sequences, is shown to provide a range of accuracy from 70.5% to 76.4%. The best accuracy of 76.4% (standard deviation 8.4%), is 3.1% (Q(3)) and 4.4% (SOV2) better than the PHD algorithm run on the same set of 406 sequence non-redundant proteins that were not used to train either method. Residues predicted by the new method with a confidence value of 5 or greater, have an average Q(3) accuracy of 84%, and cover 68% of the residues. Relative solvent accessibility based on a two state model, for 25, 5, and 0% accessibility are predicted at 76.2, 79.8, and 86. 6% accuracy respectively. The source of the improvements obtained from training with different representations of the same alignment data are described in detail. The new Jnet prediction method resulting from this study is available in the Jpred secondary structure prediction server, and as a stand-alone computer program from: http://barton.ebi.ac.uk/. Proteins 2000;40:502-511. 相似文献
5.
6.
Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure 总被引:1,自引:0,他引:1
MOTIVATION: Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. RESULTS: We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. AVAILABILITY: The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide 相似文献
7.
Chu W Ghahramani Z Podtelezhnikov A Wild DL 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(2):98-113
In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. The segmental model is a generalization of the hidden Markov model where a hidden state generates segments of various length and secondary structure type. A novel parameterized model is proposed for the likelihood function that explicitly represents multiple sequence alignment profiles to capture the segmental conformation. Numerical results on benchmark data sets show that incorporating the profiles results in substantial improvements and the generalization performance is promising. By incorporating the information from long range interactions in /spl beta/-sheets, this model is also capable of carrying out inference on contact maps. This is an important advantage of probabilistic generative models over the traditional discriminative approach to protein secondary structure prediction. The Web server of our algorithm and supplementary materials are available at http://public.kgi.edu/-wild/bsm.html. 相似文献
8.
It is now possible to compare life forms at high levels of detail and completeness due to the increasing availability of whole genomes from all three domains. However, exploration of interesting hypotheses requires the ability to recognize a correspondence between proteins that may since have diverged beyond the threshold of detection by sequence-based methods. Since protein structure is far better conserved than protein sequence, structural information can enhance detection sensitivity, and this is the basis for the field of structural genomics. Demonstrating the effectiveness of this approach, we identify two important but previously elusive Archaeal enzymes: a homolog of dihydropteroate synthase and a thymidylate synthase. The former is especially noteworthy in that no Archaeal homolog of a bacterial folate biosynthetic enzyme has been found to date. Experimental confirmation of the deduced activity of both enzymes is described. Identification of two different proteins was attempted deliberately to help allay concern that predictive success is merely a lucky accident. 相似文献
9.
Ginalski K Pas J Wyrwicz LS von Grotthuss M Bujnicki JM Rychlewski L 《Nucleic acids research》2003,31(13):3804-3807
ORFeus is a fully automated, sensitive protein sequence similarity search server available to the academic community via the Structure Prediction Meta Server (http://BioInfo.PL/Meta/). The goal of the development of ORFeus was to increase the sensitivity of the detection of distantly related protein families. Predicted secondary structure information was added to the information about sequence conservation and variability, a technique known from hybrid threading approaches. The accuracy of the meta profiles created this way is compared with profiles containing only sequence information and with the standard approach of aligning a single sequence with a profile. Additionally, the alignment of meta profiles is more sensitive in detecting remote homology between protein families than if aligning two sequence-only profiles or if aligning a profile with a sequence. The specificity of the alignment score is improved in the lower specificity range compared with the robust sequence-only profiles. 相似文献
10.
The secondary structures of the RNAs from the signal recognition particle, termed SRP-RNA, were derived buy comparative analyses of an alignment of 39 sequences. The models are minimal in that only base pairs are included for which there is comparative evidence. The structures represent refinements of earlier versions and include a new short helix. 相似文献
11.
Pugalenthi G Kandaswamy KK Chou KC Vivekanandan S Kolatkar P 《Protein and peptide letters》2012,19(1):50-56
Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/. 相似文献
12.
MOTIVATION: The number of protein families has been estimated to be as small as 1000. Recent study shows that the growth in discovery of novel structures that are deposited into PDB and the related rate of increase of SCOP categories are slowing down. This indicates that the protein structure space will be soon covered and thus we may be able to derive most of remaining structures by using the known folding patterns. Present tertiary structure prediction methods behave well when a homologous structure is predicted, but give poorer results when no homologous templates are available. At the same time, some proteins that share twilight-zone sequence identity can form similar folds. Therefore, determination of structural similarity without sequence similarity would be beneficial for prediction of tertiary structures. RESULTS: The proposed PFRES method for automated protein fold classification from low identity (<35%) sequences obtains 66.4% and 68.4% accuracy for two test sets, respectively. PFRES obtains 6.3-12.4% higher accuracy than the existing methods. The prediction accuracy of PFRES is shown to be statistically significantly better than the accuracy of competing methods. Our method adopts a carefully designed, ensemble-based classifier, and a novel, compact and custom-designed feature representation that includes nearly 90% less features than the representation of the most accurate competing method (36 versus 283). The proposed representation combines evolutionary information by using the PSI-BLAST profile-based composition vector and information extracted from the secondary structure predicted with PSI-PRED. AVAILABILITY: The method is freely available from the authors upon request. 相似文献
13.
I. I. Litvinov M. Yu. Lobanov A. A. Mironov A. V. Finkelshtein M. A. Roytberg 《Molecular Biology》2006,40(3):474-480
The most popular algorithms employed in the pairwise alignment of protein primary structures (Smith-Watermann (SW) algorithm, FASTA, BLAST, etc.) only analyze the amino acid sequence. The SW algorithm is the most accurate, yielding alignments that agree best with superimpositions of the corresponding spatial structures of proteins. However, even the SW algorithm fails to reproduce the spatial structure alignment when the sequence identity is lower than 30%. The objective of this work was to develop a new and more accurate algorithm taking the secondary structure of proteins into account. The alignments generated by this algorithm and having the maximal weight with the secondary structure considered proved to be more accurate than SW alignments. With sequences having less than 30% identity, the accuracy (i.e., the portion of reproduced positions of a reference alignment obtained by superimposing the protein spatial structures) of the new algorithm is 58 vs. 35% of the SW algorithm. The accuracy of the new algorithm is much the same with secondary structures established experimentally or predicted theoretically. Hence, the algorithm is applicable to proteins with unknown spatial structures. The program is available at ftp://194.149.64.196/STRUSWER/. 相似文献
14.
To explore the spatial organization and functional dynamics of the citrate transport protein (CTP), a nitroxide scan was carried out along 22 consecutive residues within the fourth transmembrane domain (TMDIV). This domain has been implicated as being of unique importance to the CTP mechanism due to (i) the presence of two intramembranous positive charges that are essential for CTP function and (ii) the existence of a transmembrane aqueous surface within this domain which likely corresponds to a portion of the citrate translocation pathway. The sequence-specific variation in the mobilities of the introduced nitroxides and their accessibilities to molecular O(2) reveal an alpha-helical conformation along the sequence. The accessibilities to NiEDDA are out of phase with accessibilites to O(2), indicating that one face of the helix is solvated by the lipid bilayer while the other is solvated by an aqueous environment. A gradient of NiEDDA accessibility is observed along the helix surface facing the aqueous phase, and the EPR spectral line shapes at these sites indicate considerable motional restriction. In the context of the model where TMDIV lines the translocation pathway, these data suggest a barrier to passive diffusion through the pathway. This paper reports the first use of site-directed spin labeling to study mitochondrial transporter structure. 相似文献
15.
Background
Protein sequence alignment is one of the basic tools in bioinformatics. Correct alignments are required for a range of tasks including the derivation of phylogenetic trees and protein structure prediction. Numerous studies have shown that the incorporation of predicted secondary structure information into alignment algorithms improves their performance. Secondary structure predictors have to be trained on a set of somewhat arbitrarily defined states (e.g. helix, strand, coil), and it has been shown that the choice of these states has some effect on alignment quality. However, it is not unlikely that prediction of other structural features also could provide an improvement. In this study we use an unsupervised clustering method, the self-organizing map, to assign sequence profile windows to "structural states" and assess their use in sequence alignment. 相似文献16.
Carugo O 《Protein engineering》2000,13(9):607-609
The solvent accessibility of each residue is predicted on the basis of the protein sequence. A set of 338 monomeric, non-homologous and high-resolution protein crystal structures is used as a learning set and a jackknife procedure is applied to each entry. The prediction is based on the comparison of the observed and the average values of the solvent-accessible area. It appears that the prediction accuracy is significantly improved by considering the residue types preceding and/or following the residue whose accessibility must be predicted. In contrast, the separate treatment of different secondary structural types does not improve the quality of the prediction. It is furthermore shown that the residue accessibility is much better predicted in small than in larger proteins. Such a discrepancy must be carefully considered in any algorithm for predicting residue accessibility. 相似文献
17.
In this study, we propose a novel method to predict the solvent accessible surface areas of transmembrane residues. For both transmembrane alpha-helix and beta-barrel residues, the correlation coefficients between the predicted and observed accessible surface areas are around 0.65. On the basis of predicted accessible surface areas, residues exposed to the lipid environment or buried inside a protein can be identified by using certain cutoff thresholds. We have extensively examined our approach based on different definitions of accessible surface areas and a variety of sets of control parameters. Given that experimentally determining the structures of membrane proteins is very difficult and membrane proteins are actually abundant in nature, our approach is useful for theoretically modeling membrane protein tertiary structures, particularly for modeling the assembly of transmembrane domains. This approach can be used to annotate the membrane proteins in proteomes to provide extra structural and functional information. 相似文献
18.
An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus (pseudo) protein for the set. The algorithm is a heuristic in that it computes an approximation to the optimal multiple structure alignment that minimizes the sum of the pairwise distances between the protein structures. The algorithm chooses an input protein as the initial consensus and computes a correspondence between the protein structures (which are represented as sets of unit vectors) using an approach analogous to the center-star method for multiple sequence alignment. From this correspondence, a set of rotation matrices (optimal for the given correspondence) is derived to align the structures and derive the new consensus. The process is iterated until the sum of pairwise distances converges. The computation of the optimal rotations is itself an iterative process that both makes use of the current consensus and generates simultaneously a new one. This approach is based on an interesting result that allows the sum of all pairwise distances to be represented compactly as distances to the consensus. Experimental results on several protein families are presented, showing that the algorithm converges quite rapidly. 相似文献
19.
Scott Montgomerie Shan Sundararaj Warren J Gallin David S Wishart 《BMC bioinformatics》2006,7(1):301-13
Background
The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3) of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence) database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences), the probability of a newly identified sequence having a structural homologue is actually quite high. 相似文献20.
We report an investigation of how much protein structural information could be obtained using a site-directed fluorescence labeling (SDFL) strategy. In our experiments, we used 21 consecutive single-cysteine substitution mutants in T4 lysozyme (residues T115-K135), located in a helix-turn-helix motif. The mutants were labeled with the fluorescent probe monobromobimane and subjected to an array of fluorescence measurements. Thermal stability measurements show that introduction of the label is substantially perturbing only when it is located at buried residue sites. At buried sites (solvent surface accessibility of <40 A(2)), the destabilizations are between 3 and 5.5 kcal/mol, whereas at more exposed sites, DeltaDeltaG values of < or = 1.5 kcal/mol are obtained. Of all the fluorescence parameters that were explored (excitation lambda(max), emission lambda(max), fluorescence lifetime, quantum yield, and steady-state anisotropy), the emission lambda(max) and the steady-state anisotropy values most accurately reflect the solvent surface accessibility at each site as calculated from the crystal structure of cysteine-less T4 lysozyme. The parameters we identify allow the classification of each site as buried, partially buried, or exposed. We find that the variations in these parameters as a function of residue number reflect the sequence-specific secondary structure, the determination of which is a key step for modeling a protein of unknown structure. 相似文献