首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An integrated family of amino acid sequence analysis programs   总被引:12,自引:0,他引:12  
During the last years abundant sequence data has become availabledue to the rapid progress in protein and DNA sequencing techniques.The exact three-dimensional structures, however, are availableonly for a fraction of proteins with known sequences. For manypurposes the primary amino acid sequence of a protein can bedirectly used to predict important structural parameters. However,mathematical presentation of the calculated values often makesinterpretation difficult, especially if many proteins must beanalysed and compared. Here we introduce a broad-based, user-definedanalysis of amino acid sequence information. The program packageis based on published algorithms and is designed to access standardprotein data bases, calculate hydropathy, surface probabilityand flexibility values and perform secondary structure predictions.The data output is in an ‘easy-to-read’ graphicformat and several parameters can be superimposed within a singleplot in order to simplify data interpretations. Additionally,this package includes a novel algorithm for the prediction ofpotential antigenic sites. Thus the software package presentedhere offers a powerful means of analysing an amino acid sequencefor the purpose of structure/function studies as well as antigenicsite analyses. These algorithms were written to function incontext with the UWGCG (University of Wisconsin Genetics ComputerGroup) program collection, and are now distributed within thatpackage. Received on March 20, 1987; accepted on September 4, 1987  相似文献   

2.
The secondary structure of rabbit liver microsomal cytochrome P-450 LM2, rat liver microsomal cytochromes P-450b and P-450e (phenobarbital-inducible), and rat liver microsomal cytochromes P-450c, P-450d (3-methylcholanthrene-inducible) was predicted by a combination of methods (i) identifying the transmembrane parts of integral membrane proteins, and (ii) statistically predicting the secondary structure of globular proteins. The results are similar for all phenobarbital-inducible enzymes and make it possible to construct two structural models with seven or four transmembrane alpha-helices. The cytochromes of the second group obviously form a second structural family with four membrane-spanning alpha-helices. In both cases, a large ectodomain with several consecutive alpha-helices, which may provide the heme-binding pocket, is exposed out of the membrane.  相似文献   

3.
Predicting the three-dimensional structure of proteins is a difficult task. In the last few years several approaches have been proposed for performing this task taking into account different protein chemical and physical properties. As a result, a growing number of protein structure prediction tools is becoming available, some of them specialized to work on either some aspects of the predictions or on some categories of proteins; however, they are still not sufficiently accurate and reliable for predicting all kinds of proteins. In this context, it is useful to jointly apply different prediction tools and combine their results in order to improve the quality of the predictions. However, several problems have to be solved in order to make this a viable possibility. In this paper a framework and a tool is proposed which allows: (i) definition of a common reference applicative domain for different prediction tools; (ii) characterization of prediction tools through evaluating some quality parameters; (iii) characterization of the performances of a team of predictors jointly applied over a prediction problem; (iv) the singling out of the best team for a prediction problem; and (v) the integration of predictor results in the team in order to obtain a unique prediction. A system implementing the various steps of the proposed framework (CooPPS) has been developed and several experiments for testing the effectiveness of the proposed approach have been carried out.  相似文献   

4.
TESE is a web server for the generation of test sets of protein sequences and structures fulfilling a number of different criteria. At least three different use cases can be envisaged: (i) benchmarking of novel methods; (ii) test sets tailored for special needs and (iii) extending available datasets. The CATH structure classification is used to control structural/sequence redundancy and a variety of structural quality parameters can be used to interactively select protein subsets with specific characteristics, e.g. all X-ray structures of alpha-helical repeat proteins with more than 120 residues and resolution <2.0 A. The output includes FASTA-formatted sequences, PDB files and a clickable HTML index file containing images of the selected proteins. Multiple subsets for cross-validation are also supported. AVAILABILITY: The TESE server is available for non-commercial use at URL: http://protein.bio.unipd.it/tese/.  相似文献   

5.
Development of efficient strategies and automation represent important milestones of progress in rapid structure determination efforts in proteomics research. In this context, we present here an efficient algorithm named as AUTOBA (Automatic Backbone Assignment) designed to automate the assignment protocol based on HN(C)N suite of experiments. Depending upon the spectral dispersion, the user can record 2D or 3D versions of the experiments for assignment. The algorithm uses as inputs: (i) protein primary sequence and (ii) peak-lists from user defined HN(C)N suite of experiments. In the end, one gets HN, 15N, Cα and C′ assignments (in common BMRB format) for the individual residues along the polypeptide chain. The success of the algorithm has been demonstrated, not only with experimental spectra recorded on two small globular proteins: ubiquitin (76 aa) and M-crystallin (85 aa), but also with simulated spectra of 27 other proteins using assignment data from the BMRB.  相似文献   

6.
Prediction of protein classification is an important topic in molecular biology. This is because it is able to not only provide useful information from the viewpoint of structure itself, but also greatly stimulate the characterization of many other features of proteins that may be closely correlated with their biological functions. In this paper, the LogitBoost, one of the boosting algorithms developed recently, is introduced for predicting protein structural classes. It performs classification using a regression scheme as the base learner, which can handle multi-class problems and is particularly superior in coping with noisy data. It was demonstrated that the LogitBoost outperformed the support vector machines in predicting the structural classes for a given dataset, indicating that the new classifier is very promising. It is anticipated that the power in predicting protein structural classes as well as many other bio-macromolecular attributes will be further strengthened if the LogitBoost and some other existing algorithms can be effectively complemented with each other.  相似文献   

7.
Deleterious mutation prediction in the secondary structure of RNAs   总被引:1,自引:0,他引:1       下载免费PDF全文
Barash D 《Nucleic acids research》2003,31(22):6578-6584
  相似文献   

8.
A number of subtypes of the alpha-adrenoceptor have been identified; however, knowledge of the three-dimensional structures of such membrane proteins is limited, and no crystal structure of an alpha-adrenoceptor is available to date. We have developed and analysed homology models of the alpha1A-adrenoceptor subtype based on the bovine rhodopsin crystal structure (1l9 h). Subsequent structural refinement was performed through molecular dynamics simulations using the Amber 7 suite of programs with a biphasic H2O/CHCl3/H2O cell utilised to mimic the receptor's natural membrane environment.  相似文献   

9.
This article reviews recent work towards modelling protein folding pathways using a bioinformatics approach. Statistical models have been developed for sequence-structure correlations in proteins at five levels of structural complexity: (i) short motifs; (ii) extended motifs; (iii) nonlocal pairs of motifs; (iv) 3-dimensional arrangements of multiple motifs; and (v) global structural homology. We review statistical models, including sequence profiles, hidden Markov models (HMMs) and interaction potentials, for the first four levels of structural detail. The I-sites (folding Initiation sites) Library models short local structure motifs. Each succeeding level has a statistical model, as follows: HMMSTR (HMM for STRucture) is an HMM for extended motifs; HMMSTR-CM (Contact Maps) is a model for pairwise interactions between motifs; and SCALI-HMM (HMMs for Structural Core ALIgnments) is a set of HMMs for the spatial arrangements of motifs. The parallels between the statistical models and theoretical models for folding pathways are discussed in this article; however, global sequence models are not discussed because they have been extensively reviewed elsewhere. The data used and algorithms presented in this article are available at http://www.bioinfo.rpi.edu/~bystrc/ (click on "servers" or "downloads") or by request to bystrc@rpi.edu .  相似文献   

10.
We present a new method, secondary structure prediction by deviation parameter (SSPDP) for predicting the secondary structure of proteins from amino acid sequence. Deviation parameters (DP) for amino acid singlets, doublets and triplets were computed with respect to secondary structural elements of proteins based on the dictionary of secondary structure prediction (DSSP)-generated secondary structure for 408 selected nonhomologous proteins. To the amino acid triplets which are not found in the selected dataset, a DP value of zero is assigned with respect to the secondary structural elements of proteins. The total number of parameters generated is 15,432, in the possible parameters of 25,260. Deviation parameter is complete with respect to amino acid singlets, doublets, and partially complete with respect to amino acid triplets. These generated parameters were used to predict secondary structural elements from amino acid sequence. The secondary structure predicted by our method (SSPDP) was compared with that of single sequence (NNPREDICT) and multiple sequence (PHD) methods. The average value of the percentage of prediction accuracy for αhelix by SSPDP, NNPREDICT and PHD methods was found to be 57%, 44% and 69% respectively for the proteins in the selected dataset. For Β-strand the prediction accuracy is found to be 69%, 21% and 53% respectively by SSPDP, NNPREDICT and PHD methods. This clearly indicates that the secondary structure prediction by our method is as good as PHD method but much better than NNPREDICT method.  相似文献   

11.
Li X  Jacobson MP  Friesner RA 《Proteins》2004,55(2):368-382
We have developed a new method for predicting helix positions in globular proteins that is intended primarily for comparative modeling and other applications where high precision is required. Unlike helix packing algorithms designed for ab initio folding, we assume that knowledge is available about the qualitative placement of all helices. However, even among homologous proteins, the corresponding helices can demonstrate substantial differences in positions and orientations, and for this reason, improperly positioned helices can contribute significantly to the overall backbone root-mean-square deviation (RMSD) of comparative models. A helix packing algorithm for use in comparative modeling must obtain high precision to be useful, and for this reason we utilize an all-atom protein force field (OPLS) and a Generalized Born continuum solvent model. To reduce the computational expense associated with using a detailed, physics-based energy function, we have developed new hierarchical and multiscale algorithms for sampling the helices and flanking loops. We validate the method using a test suite of 33 cases, which are drawn from a diverse set of high-resolution crystal structures. The helix positions are reproduced with an average backbone RMSD of 0.6 A, while the average backbone RMSD of the complete loop-helix-loop region (i.e., the helix with the surrounding loops, which are also repredicted) is 1.3 A.  相似文献   

12.
The genomes of many organisms have been sequenced in the last 5 years. Typically about 30% of predicted genes from a newly sequenced genome cannot be given functional assignments using sequence comparison methods. In these situations three-dimensional structural predictions combined with a suite of computational tools can suggest possible functions for these hypothetical proteins. Suggesting functions may allow better interpretation of experimental data (e.g., microarray data and mass spectroscopy data) and help experimentalists design new experiments. In this paper, we focus on three hypothetical proteins of Shewanella oneidensis MR-1 that are potentially related to iron transport/metabolism based on microarray experiments. The threading program PROSPECT was used for protein structural predictions and functional annotation, in conjunction with literature search and other computational tools. Computational tools were used to perform transmembrane domain predictions, coiled coil predictions, signal peptide predictions, sub-cellular localization predictions, motif prediction, and operon structure evaluations. Combined computational results from all tools were used to predict roles for the hypothetical proteins. This method, which uses a suite of computational tools that are freely available to academic users, can be used to annotate hypothetical proteins in general.  相似文献   

13.
Identification and characterization of antigenic determinants on proteins has received considerable attention utilizing both, experimental as well as computational methods. For computational routines mostly structural as well as physicochemical parameters have been utilized for predicting the antigenic propensity of protein sites. However, the performance of computational routines has been low when compared to experimental alternatives. Here we describe the construction of machine learning based classifiers to enhance the prediction quality for identifying linear B-cell epitopes on proteins. Our approach combines several parameters previously associated with antigenicity, and includes novel parameters based on frequencies of amino acids and amino acid neighborhood propensities. We utilized machine learning algorithms for deriving antigenicity classification functions assigning antigenic propensities to each amino acid of a given protein sequence. We compared the prediction quality of the novel classifiers with respect to established routines for epitope scoring, and tested prediction accuracy on experimental data available for HIV proteins. The major finding is that machine learning classifiers clearly outperform the reference classification systems on the HIV epitope validation set.  相似文献   

14.
Small interfering RNA (siRNA)-mediated knock-down is a widely used experimental approach to characterizing gene function. Although siRNAs are designed to guide the cleavage of perfectly complementary mRNA targets, acting similarly to microRNAs (miRNAs), siRNAs down-regulate the expression of hundreds of genes to which they have only partial complementarity. Prediction of these siRNA ‘off-targets’ remains difficult, due to the incomplete understanding of siRNA/miRNA–target interactions. Combining a biophysical model of miRNA–target interaction with structure and sequence features of putative target sites we developed a suite of algorithms, MIRZA-G, for the prediction of miRNA targets and siRNA off-targets on a genome-wide scale. The MIRZA-G variant that uses evolutionary conservation performs better than currently available methods in predicting canonical miRNA target sites and in addition, it predicts non-canonical miRNA target sites with similarly high accuracy. Furthermore, MIRZA-G variants predict siRNA off-target sites with an accuracy unmatched by currently available programs. Thus, MIRZA-G may prove instrumental in the analysis of data resulting from large-scale siRNA screens.  相似文献   

15.
16.
A pair of neural network-based algorithms is presented for predicting the tertiary structural class and the secondary structure of proteins. Each algorithm realizes improvements in accuracy based on information provided by the other. Structural class prediction of proteins nonhomologous to any in the training set is improved significantly, from 62.3% to 73.9%, and secondary structure prediction accuracy improves slightly, from 62.26% to 62.64%. A number of aspects of neural network optimization and testing are examined. They include network overtraining and an output filter based on a rolling average. Secondary structure prediction results vary greatly depending on the particular proteins chosen for the training and test sets; consequently, an appropriate measure of accuracy reflects the more unbiased approach of “jackknife” cross-validation (testing each protein in the database individually).  相似文献   

17.
Prediction of β-turns from amino acid sequences has long been recognized as an important problem in structural bioinformatics due to their frequent occurrence as well as their structural and functional significance. Because various structural features of proteins are intercorrelated, secondary structure information has been often employed as an additional input for machine learning algorithms while predicting β-turns. Here we present a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN) capable of predicting multiple mutually dependent structural motifs and demonstrate its efficiency in recognizing three aspects of protein structure: β-turns, β-turn types, and secondary structure. The advantage of our method compared to other predictors is that it does not require any external input except for sequence profiles because interdependencies between different structural features are taken into account implicitly during the learning process. In a sevenfold cross-validation experiment on a standard test dataset our method exhibits the total prediction accuracy of 77.9% and the Mathew's Correlation Coefficient of 0.45, the highest performance reported so far. It also outperforms other known methods in delineating individual turn types. We demonstrate how simultaneous prediction of multiple targets influences prediction performance on single targets. The MOLEBRNN presented here is a generic method applicable in a variety of research fields where multiple mutually depending target classes need to be predicted. Availability: http://webclu.bio.wzw.tum.de/predator-web/.  相似文献   

18.
We present an approach to predicting protein structural class that uses amino acid composition and hydrophobic pattern frequency information as input to two types of neural networks: (1) a three-layer back-propagation network and (2) a learning vector quantization network. The results of these methods are compared to those obtained from a modified Euclidean statistical clustering algorithm. The protein sequence data used to drive these algorithms consist of the normalized frequency of up to 20 amino acid types and six hydrophobic amino acid patterns. From these frequency values the structural class predictions for each protein (all-alpha, all-beta, or alpha-beta classes) are derived. Examples consisting of 64 previously classified proteins were randomly divided into multiple training (56 proteins) and test (8 proteins) sets. The best performing algorithm on the test sets was the learning vector quantization network using 17 inputs, obtaining a prediction accuracy of 80.2%. The Matthews correlation coefficients are statistically significant for all algorithms and all structural classes. The differences between algorithms are in general not statistically significant. These results show that information exists in protein primary sequences that is easily obtainable and useful for the prediction of protein structural class by neural networks as well as by standard statistical clustering algorithms.  相似文献   

19.
We present a protocol for predicting protein flexibility from NMR chemical shifts. The protocol consists of (i) ensuring that the chemical shift assignments are correctly referenced or, if not, performing a reference correction using information derived from the chemical shift index, (ii) calculating the random coil index (RCI), and (iii) predicting the expected root mean square fluctuations (RMSFs) and order parameters (S2) of the protein from the RCI. The key advantages of this protocol over existing methods for studying protein dynamics are that (i) it does not require prior knowledge of a protein's tertiary structure, (ii) it is not sensitive to the protein's overall tumbling and (iii) it does not require additional NMR measurements beyond the standard experiments for backbone assignments. When chemical shift assignments are available, protein flexibility parameters, such as S2 and RMSF, can be calculated within 1-2 h using a spreadsheet program.  相似文献   

20.
Insight into the functions and interactions of proteins may be gained by correlating a variety of types of experimental data (including kinetics, spectroscopy, biophysical measurements, among others) with three-dimensional structural models displayed and manipulated using interactive computer graphics. Although tertiary structures have been determined for a large number of proteins, one limiting factor in structure-function studies is the lack of availability of the structural coordinates of specific proteins for which other types of detailed experimental data are known. However, as the data base of known structures grows, it becomes more and more likely that the structure of a closely related protein will be available. Here we present a method for predicting structures by ( 1 ) careful alteration of a known structure of a homologous, functionally analogous protein followed by (2) energy minimization to optimize the predicted structure. This method provides a rapid and effective solution to the initial problem of obtaining a working structure for modeling studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号