首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Predicting the subcellular localisation of proteins is an important part of the elucidation of their functions and interactions. Here, the amino acid sequence motifs that direct proteins to their proper subcellular compartment are surveyed, different methods for localisation prediction are discussed, and some benchmarks for the more commonly used predictors are presented.  相似文献   

2.
Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor.  相似文献   

3.

Background  

The rapid accumulation of data on non-synonymous single nucleotide polymorphisms (nsSNPs, also called SAPs) should allow us to further our understanding of the underlying disease-associated mechanisms. Here, we use complex networks to study the role of an amino acid in both local and global structures and determine the extent to which disease-associated and polymorphic SAPs differ in terms of their interactions to other residues.  相似文献   

4.
5.

Background  

Understanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. We present a machine learning approach for the identification of amino acid residues involved in protein-DNA interactions.  相似文献   

6.
Predicting surface exposure of amino acids from protein sequence   总被引:8,自引:0,他引:8  
The amino acid residues on a protein surface play a key role in interaction with other molecules, determined many physical properties, and constrain the structure of the folded protein. A database of monomeric protein crystal structures was used to teach computer-simulated neural networks rules for predicting surface exposure from local sequence. These trained networks are able to correctly predict surface exposure for 72% of residues in a testing set using a binary model, (buried/exposed) and for 54% of residues using a ternary model (buried/intermediate/exposed). In the ternary model, only 11% of the exposed residues are predicted as buried and only 5% of the buried residues are predicted as exposed. Also, since the networks are able to predict exposure with a quantitative confidence estimate, it is possible to assign exposure for over half of the residues in a binary model with greater than 80% accuracy. Even more accurate predictions are obtained by making a consensus prediction of exposure for a homologous family. The effect of the local environment of an amino acid on its accessibility, though smaller than expected, is significant and accounts for the higher success rate of prediction than obtained with previously used criteria. In the absence of a three-dimensional structure, the ability to predict surface accessibility of amino acids directly from the sequence is a valuable tool in choosing sites of chemical modification or specific mutations and in studies of molecular interaction.  相似文献   

7.
This report continues to explore the use of a strategy known as the antlion method for predicting polypeptide and protein structure. The method involves deformation of a biopolymer's potential energy hypersurface in order to retain only a single minimum, near to the native structure. The vexing multiple minimum problem thus is relieved, and the deformed hypersurface constitutes a key element in three-dimensional structure predictions with atomic resolution. In this more demanding pilot study, we provide evidence that the antlion method is capable of dramatically simplifying the surface of polypeptides by successfully predicting the native form of the naturally occurring 26-residue polypeptide melittin. The systematic hypersurface modifications employed in our previous work have been used again for this case, but have been supplemented by the output of a suitable neural network. This neural network involves a new feature: the use of amino acid biophysical scales for improving the secondary structure prediction accuracy of simple perceptrons. © 1993 John Wiley & Sons, Inc.  相似文献   

8.
Complete amino acid sequence of protein B   总被引:4,自引:0,他引:4  
The complete amino acid sequence of protein B (= CAMP factor) of Streptococcus agalactiae has been determined. The sequence data were obtained mainly by manual sequencing of peptides derived from digestion with lysyl-peptidase, clostripain and Staphylococcus aureus protease and by solid phase sequencing of cyanogen bromide fragments. The protein contains 226 amino acids and has an Mr of 25,263. The sequence was compared with sequences of other Fc-binding proteins and partial sequence homology was found between protein B and the Fc-binding region of protein A.  相似文献   

9.
Protein solubility plays a major role for understanding the crystal growth and crystallization process of protein. How to predict the propensity of a protein to be soluble or to form inclusion body is a long but not fairly resolved problem. After choosing almost 10,000 protein sequences from NCBI database and eliminating the sequences with 90% homologous similarity by CD-HIT, 5692 sequences remained. By using Chou's pseudo amino acid composition features, we predict the soluble protein with the three methods: support vector machine (SVM), back propagation neural network (BP Neural Network) and hybrid method based on SVM and BP Neural Network, respectively. Each method is evaluated by re-substitution test and 10-fold cross-validation test. In the re-substitution test, the BP Neural Network performs with the best results, in which the accuracy achieves 0.9288 and Matthews Correlation Coefficient (MCC) achieves 0.8513. Meanwhile, the other two methods are better than BP Neural Network in 10-fold cross-validation test. The hybrid method based on SVM and BP Neural Network is the best. The average accuracy is 0.8678 and average MCC is 0.7233. Although all of the three methods achieve considerable evaluations, the hybrid method is deemed to be the best, according to the performance comparison.  相似文献   

10.
Intrinsically disordered regions of proteins are known to have many functional roles in cell signaling and regulatory pathways. The altered expression of these proteins due to mutations is associated with various diseases. Currently, most of the available methods focus on predicting the disordered proteins or the disordered regions in a protein. On the other hand, methods developed for predicting protein disorder on mutation showed a poor performance with a maximum accuracy of 70%. Hence, in this work, we have developed a novel method to classify the disorder-related amino acid substitutions using amino acid properties, substitution matrices, and the effect of neighboring residues that showed an accuracy of 90.0% with a sensitivity and specificity of 94.9 and 80.6%, respectively, in 10-fold cross-validation. The method was evaluated with a test set of 20% data using 10 iterations, which showed an average accuracy of 88.9%. Furthermore, we systematically analyzed the features responsible for the better performance of our method and observed that neighboring residues play an important role in defining the disorder of a given residue in a protein sequence. We have developed a prediction server to identify disorder-related mutations, and it is available at http://www.iitm.ac.in/bioinfo/DIM_Pred/.  相似文献   

11.
12.
Single nucleotide polymorphism (SNP) studies and random mutagenesis projects identify amino acid substitutions in protein-coding regions. Each substitution has the potential to affect protein function. SIFT (Sorting Intolerant From Tolerant) is a program that predicts whether an amino acid substitution affects protein function so that users can prioritize substitutions for further study. We have shown that SIFT can distinguish between functionally neutral and deleterious amino acid changes in mutagenesis studies and on human polymorphisms. SIFT is available at http://blocks.fhcrc.org/sift/SIFT.html.  相似文献   

13.
A neural network-based tool, TargetP, for large-scale subcellular location prediction of newly identified proteins has been developed. Using N-terminal sequence information only, it discriminates between proteins destined for the mitochondrion, the chloroplast, the secretory pathway, and other localizations with a success rate of 85% (plant) or 90% (non-plant) on redundancy-reduced test sets. From a TargetP analysis of the recently sequenced Arabidopsis thaliana chromosomes 2 and 4 and the Ensembl Homo sapiens protein set, we estimate that 10% of all plant proteins are mitochondrial and 14% chloroplastic, and that the abundance of secretory proteins, in both Arabidopsis and Homo, is around 10%. TargetP also predicts cleavage sites with levels of correctly predicted sites ranging from approximately 40% to 50% (chloroplastic and mitochondrial presequences) to above 70% (secretory signal peptides). TargetP is available as a web-server at http://www.cbs.dtu.dk/services/TargetP/.  相似文献   

14.
A novel approach CE-Ploc is proposed for predicting protein subcellular locations by exploiting diversity both in feature and decision spaces. The diversity in a sequence of feature spaces is exploited using hydrophobicity and hydrophilicity of amphiphilic pseudo amino acid composition and a specific learning mechanism. Diversity in learning mechanisms is exploited by fusion of classifiers that are based on different learning mechanisms. Significant improvement in prediction performance is observed using jackknife and independent dataset tests.  相似文献   

15.
The conformational parametersP k for each amino acid species (j=1–20) of sequential peptides in proteins are presented as the product ofP i,k , wherei is the number of the sequential residues in thekth conformational state (k=-helix,-sheet,-turn, or unordered structure). Since the average parameter for ann-residue segment is related to the average probability of finding the segment in the kth state, it becomes a geometric mean of (P k )av=(P i,k ) 1/n with amino acid residuei increasing from 1 ton. We then used ln(Pk)av to convert a multiplicative process to a summation, i.e., ln(P k ) av =(1/n)P i,k (i=1 ton) for ease of operation. However, this is unlike the popular Chou-Fasman algorithm, which has the flaw of using the arithmetic mean for relative probabilities. The Chou-Fasman algorithm happens to be close to our calculations in many cases mainly because the difference between theirP k and our InP k is nearly constant for about one-half of the 20 amino acids. When stronger conformation formers and breakers exist, the difference become larger and the prediction at the N- and C-terminal-helix or-sheet could differ. If the average conformational parameters of the overlapping segments of any two states are too close for a unique solution, our calculations could lead to a different prediction.  相似文献   

16.
The amino acid sequence of the Q coat protein   总被引:1,自引:0,他引:1  
  相似文献   

17.
Prediction of protein structural class from the amino acid sequence   总被引:9,自引:0,他引:9  
P Klein  C Delisi 《Biopolymers》1986,25(9):1659-1672
The multidimensional statistical technique of discriminant analysis is used to allocate amino acid sequences to one of four secondary structural classes: high α content, high β content, mixed α and β, low content of ordered structure. Discrimination is based on four attributes: estimates of percentages of α and β structures, and regular variations in the hydrophobic values of residues along the sequence, occurring with periods of 2 and 3.6 residues. The reliability of the method, estimated by classifying 138 sequences from the Brookhaven Protein Data Bank, is 80%, with no misallocations between α-rich and β-rich classes. The reliability can be increased to 84% by making no allocation for proteins classified with odds close to 1. Classification using previously developed secondary structural prediction methods is considerably less reliable, the best result being 64% obtained using predictions based on the Delphi method.  相似文献   

18.
The elucidation of protein function from its amino acid sequence   总被引:1,自引:0,他引:1  
This review gives an outline of how computers may be used todetermine the function of a protein, when only its primary structureis known. The current programming methods are outlined in generalterms before their detailed application is discussed, and thecommon ways of predicting protein structure are also introduced.Identification is usually by database searching and sequencealignment, though a collection of motifs relating sequence tofunction are also described.  相似文献   

19.
The complete amino acid sequence of the calcium-binding protein (CaBP) from pig intestinal mucosa has been determined: Ac-Ser-Ala-Gln-Lys-Ser-Pro-Ala-Glu-Leu-Lys-Ser-Ile-Phe-Glu-Lys-Tyr-Ala-Ala-Lys-Glu-Gly-Asp-Pro-Asn-Gln-Leu-Ser-Lys-Glu-Glu-Leu-Lys-Gln-Leu-Ile-Gln-Ala-Glu-Phe-Pro-Ser-Leu-Leu-Lys-Gly-Pro-Arg-Thr-Leu-Asp-Asp-Leu-Phe-Gln-Glu-Leu-Asp-Lys-Asn-Gly-Asn-Gly-Glu-Val-Ser-Phe-Glu-Glu-Phe-Gln-Val-Leu-Val-Lys-Lys-Ile-Ser-Gln-OH. The N-terminal octapeptide sequence was determined by mass spectrometric analysis by Morris and Dell. The first 45 residues of bovine CaBP differ only in six positions from the corresponding sequence of the porcine protein, except that the sequence starts in position two of the porcine sequence. The mammalian intestinal CaBP's belong to the troponin-C superfamily on the basis of an analysis by Barker and Dayhoff.  相似文献   

20.
The amino acid sequence of the first thirty nine residues of the nonhistone chromosomal protein HMG-17 has been determined. Results presented here give a molecular weight of 11,000 for the protein. Some interesting sequence homology with the trout specific histone, histone-T, is noted.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号