首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 843 毫秒
1.
2.
Li Y  Wen Z  Zhou C  Tan F  Li M 《Peptides》2008,29(9):1498-1504
Signal peptide has a pivotal role in the translocation of secretory protein. Some models have been designed to predict its cleavage site. It is reported that the cleavage site has relationship with the neighboring sequence environment, i.e., hydrophobic core h-region, and the specific patterns in c-region. In some studies, this finding does facilitate the prediction of cleavage site. However, in these models, sequence environment information is merely taken account of as model inputs and no detailed investigation into its effect on the prediction of cleavage site has been made. In this work, we analyze the constraint on cleave site placed by the hydrophobic core of signal peptide and then use it to improve the performance of the signal peptide cleavage site prediction. Our model is designed as follows: firstly, a sliding window is used to scan sample and artificial neural network (ANN) is employed to give cleavage site/non-cleavage site scores. Then, based on an estimated hydrophobic h-region a correcting function is proposed to improve the prediction result, in which the sequence environment is taken into account. A trend of cleavage site is indicated by our analysis for each position, which is consistent with experimental findings. Through this correcting step, the improvement of prediction accuracy is over 7%. It therefore demonstrates the neighboring sequence environment is helpful for determination of cleavage site. Program written in Matlab can be downloaded from http://www.scucic.cn/combined model/source code.html.  相似文献   

3.
J D Hirst  M J Sternberg 《Biochemistry》1992,31(32):7211-7218
The applications of artificial neural networks to the prediction of structural and functional features of protein and nucleic acid sequences are reviewed. A brief introduction to neural networks is given, including a discussion of learning algorithms and sequence encoding. The protein applications mostly involve the prediction of secondary and tertiary structure from sequence. The problems in nucleic acid analysis tackled by neural networks are the prediction of translation initiation sites in Escherichia coli, the recognition of splice junctions in human mRNA, and the prediction of promoter sites in E. coli. The performance of the approach is compared with other current statistical methods.  相似文献   

4.
This work presents a dynamic artificial neural network methodology, which classifies the proteins into their classes from their sequences alone: the lysosomal membrane protein classes and the various other membranes protein classes. In this paper, neural networks-based lysosomal-associated membrane protein type prediction system is proposed. Different protein sequence representations are fused to extract the features of a protein sequence, which includes seven feature sets; amino acid (AA) composition, sequence length, hydrophobic group, electronic group, sum of hydrophobicity, R-group, and dipeptide composition. To reduce the dimensionality of the large feature vector, we applied the principal component analysis. The probabilistic neural network, generalized regression neural network, and Elman regression neural network (RNN) are used as classifiers and compared with layer recurrent network (LRN), a dynamic network. The dynamic networks have memory, i.e. its output depends not only on the input but the previous outputs also. Thus, the accuracy of LRN classifier among all other artificial neural networks comes out to be the highest. The overall accuracy of jackknife cross-validation is 93.2% for the data-set. These predicted results suggest that the method can be effectively applied to discriminate lysosomal associated membrane proteins from other membrane proteins (Type-I, Outer membrane proteins, GPI-Anchored) and Globular proteins, and it also indicates that the protein sequence representation can better reflect the core feature of membrane proteins than the classical AA composition.  相似文献   

5.
6.
Neuropeptides are an important class of signaling molecules that result from complex and variable posttranslational processing of precursor proteins and thus are difficult to identify based solely on genomic information. Bioinformatics prediction of precursor cleavage sites can support effective biochemical characterization of neuropeptides. Neuropeptide cleavage models were developed using comprehensive human, mouse, rat, and cattle precursor data sets and used to compare predicted neuropeptide processing across these species. Logistic regression and artificial neural network models were used to predict cleavages based on amino acid and physiochemical properties of amino acids at precursor sequence locations proximal to cleavage. Correct cleavage classification rates across species and models ranged from 85% to 100%, suggesting that amino acid and amino acid properties have major impact on the probability of cleavage and that these factors have comparable effects in human, mouse, rat, and cattle. The variable accuracy of each species-specific model to predict cleavage sites indicated that there are species- and precursor-specific processing patterns. Prediction of mouse cleavages using rat models was highly accurate, yet the reverse was not observed. Sensitivity and specificity revealed that logistic models are well suited to maximize the rate of true noncleavage predictions with moderate rates of true cleavage predictions; meanwhile, artificial neural networks maximize the rate of true cleavage predictions with moderate to low true noncleavage predictions. Logistic models also provided insights into the strength of the amino acid associations with cleavage. Prediction of neuropeptide cleavage sites using human, mouse, rat, and cattle models are available at . Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users. Allison Tegge and Bruce Southey contributed equally to this work.  相似文献   

7.
MOTIVATION: Remote homology detection is among the most intensively researched problems in bioinformatics. Currently discriminative approaches, especially kernel-based methods, provide the most accurate results. However, kernel methods also show several drawbacks: in many cases prediction of new sequences is computationally expensive, often kernels lack an interpretable model for analysis of characteristic sequence features, and finally most approaches make use of so-called hyperparameters which complicate the application of methods across different datasets. RESULTS: We introduce a feature vector representation for protein sequences based on distances between short oligomers. The corresponding feature space arises from distance histograms for any possible pair of K-mers. Our distance-based approach shows important advantages in terms of computational speed while on common test data the prediction performance is highly competitive with state-of-the-art methods for protein remote homology detection. Furthermore the learnt model can easily be analyzed in terms of discriminative features and in contrast to other methods our representation does not require any tuning of kernel hyperparameters. AVAILABILITY: Normalized kernel matrices for the experimental setup can be downloaded at www.gobics.de/thomas. Matlab code for computing the kernel matrices is available upon request. CONTACT: thomas@gobics.de, peter@gobics.de.  相似文献   

8.
A cDNA (VUpur5) encoding phosphoribosyl aminoimidazole (AIR) synthetase, the fifth enzyme of the de novo purine biosynthesis pathway has been isolated from a cowpea nodule cDNA library. It encodes a 388 amino acid protein with a predicted molecular mass of 40.4 kDa. The deduced amino acid sequence has significant homology with AIR synthetase from other organisms. AIR synthetase is present in both mitochondria and plastids of cowpea nodules [7]. A signal sequence encoded by the VUpur5 cDNA has properties associated with plastid transit sequences but there is no consensus cleavage site as would be expected for a plastid targeted protein. Although the signal sequence does not have the structural features of a mitochondrial targeted protein, it has a mitochondrial cleavage site motif (RX/XS) close to the predicted N-terminus of the mature protein. Southern analysis suggests that AIR synthetase is encoded by a single gene raising questions as to how the product of this gene is targeted to the two organelles. VUpur5 is expressed at much higher levels in nodules compared to other cowpea tissues and the gene is active before nitrogen fixation begins. These results suggest that products of nitrogen fixation do not play a role in the initial induction of gene expression. VUpur5 was expressed in Escherichia coli and the recombinant protein used to raise antibodies. These antibodies recognize two forms of AIR synthetase which differ in molecular size. Both forms are present in mitochondria, although the larger protein is more abundant. Only the smaller protein was detected in plastids.  相似文献   

9.
We present here a neural network-based method for detection of signal peptides (abbreviation used: SP) in proteins. The method is trained on sequences of known signal peptides extracted from the Swiss-Prot protein database and is able to work separately on prokaryotic and eukaryotic proteins. A query protein is dissected into overlapping short sequence fragments, and then each fragment is analyzed with respect to the probability of it being a signal peptide and containing a cleavage site. While the accuracy of the method is comparable to that of other existing prediction tools, it provides a significantly higher speed and portability. The accuracy of cleavage site prediction reaches 73% on heterogeneous source data that contains both prokaryotic and eukaryotic sequences while the accuracy of discrimination between signal peptides and non-signal peptides is above 93% for any source dataset. As a consequence, the method can be easily applied to genome-wide datasets. The software can be downloaded freely from http://rpsp.bioinfo.pl/RPSP.tar.gz.  相似文献   

10.
Structural class characterizes the overall folding type of a protein or its domain. This paper develops an accurate method for in silico prediction of structural classes from low homology (twilight zone) protein sequences. The proposed LLSC-PRED method applies linear logistic regression classifier and a custom-designed, feature-based sequence representation to provide predictions. The main advantages of the LLSC-PRED are the comprehensive representation that includes 58 features describing composition and physicochemical properties of the sequences and transparency of the prediction model. The representation also includes predicted secondary structure content, thus for the first time exploring synergy between these two related predictions. Based on tests performed with a large set of 1673 twilight zone domains, the LLSC-PRED's prediction accuracy, which equals over 62%, is shown to be better than accuracy of over a dozen recently published competing in silico methods and similar to accuracy of other, non-transparent classifiers that use the proposed representation.  相似文献   

11.
Picornaviral proteinases are responsible for maturation cleavages of the viral polyprotein, but also catalyze the degradation of cellular targets. Using graphical visualization techniques and neural network algorithms, we have investigated the sequence specificity of the two proteinases 2Apro and 3Cpro. The cleavage of VP0 (giving rise to VP2 and VP4), which is carried out by a so-far unknown proteinase, was also examined. In combination with a novel surface exposure prediction algorithm, our neural network approach successfully distinguishes known cleavage sites from noncleavage sites and yields a more consistent definition of features common to these sites. The method is able to predict experimentally determined cleavage sites in cellular proteins. We present a list of mammalian and other proteins that are predicted to be possible targets for the viral proteinases. Whether these proteins are indeed cleaved awaits experimental verification. Additionally, we report several errors detected in the protein databases. A computer server for prediction of cleavage sites by picornaviral proteinases is publicly available at the e-mail address NetPicoRNA@cbs.dtu.dk or via WWW at http:@www.cbs.dtu.dk/services/NetPicoRNA/.  相似文献   

12.
13.
We have developed an automated method for predicting signal peptide sequences and their cleavage sites in eukaryotic and bacterial protein sequences. It is a 2-layer predictor: the 1st-layer prediction engine is to identify a query protein as secretory or non-secretory; if it is secretory, the process will be automatically continued with the 2nd-layer prediction engine to further identify the cleavage site of its signal peptide. The new predictor is called Signal-CF, where C stands for "coupling" and F for "fusion", meaning that Signal-CF is formed by incorporating the subsite coupling effects along a protein sequence and by fusing the results derived from many width-different scaled windows through a voting system. Signal-CF is featured by high success prediction rates with short computational time, and hence is particularly useful for the analysis of large-scale datasets. Signal-CF is freely available as a web-server at http://chou.med.harvard.edu/bioinf/Signal-CF/ or http://202.120.37.186/bioinf/Signal-CF/.  相似文献   

14.
Previous work in predicting protein localization to the chloroplast organelle in plants led to the development of an artificial neural network-based approach capable of remarkable accuracy in its prediction (ChloroP). A common criticism against such neural network models is that it is difficult to interpret the criteria that are used in making predictions. We address this concern with several new prediction methods that base predictions explicitly on the abundance of different amino acid types in the N-terminal region of the protein. Our successful prediction accuracy suggests that ChloroP uses little positional information in its decision-making; an unexpected result given the elaborate ChloroP input scheme. By removing positional information, our simpler methods allow us to identify those amino acids that are useful for successful prediction. The identification of important sequence features, such as amino acid content, is advantageous if one of the goals of localization predictors is to gain an understanding of the biological process of chloroplast localization. Our most accurate predictor combines principal component analysis and logistic regression. Web-based prediction using this method is available online at http://apicoplast.cis.upenn.edu/pclr/.  相似文献   

15.
Artificial intelligence techniques for bioinformatics   总被引:1,自引:0,他引:1  
This review provides an overview of the ways in which techniques from artificial intelligence (AI) can be usefully employed in bioinformatics, both for modelling biological data and for making new discoveries. The paper covers three techniques: symbolic machine learning approaches (nearest neighbour and identification tree techniques), artificial neural networks and genetic algorithms. Each technique is introduced and supported with examples taken from the bioinformatics literature. These examples include folding prediction, viral protease cleavage prediction, classification, multiple sequence alignment and microarray gene expression analysis.  相似文献   

16.
Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns or frequency matrices. In this work, we developed a novel predictor based on Random Forest algorithm (RF) using maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, secondary structure and solvent accessibility were utilized to represent the peptides concerned. Here, we compared existing prediction tools which are available for predicting possible cleavage sites in candidate substrates with ours. It is shown that our method makes much more reliable predictions in terms of the overall prediction accuracy. In addition, this predictor allows the use of a wide range of proteinases.  相似文献   

17.
To study local structures in proteins, we previously developed an autoassociative artificial neural network (autoANN) and clustering tool to discover intrinsic features of macromolecular structures. The hidden unit activations computed by the trained autoANN are a convenient low-dimensional encoding of the local protein backbone structure. Clustering these activation vectors results in a unique classification of protein local structural features called Structural Building Blocks (SBBs). Here we describe application of this method to a larger database of proteins, verification of the applicability of this method to structure classification, and subsequent analysis of amino acid frequencies and several commonly occurring patterns of SBBs. The SBB classification method has several interesting properties: 1) it identifies the regular secondary structures, α helix and β strand; 2) it consistently identifies other local structure features (e.g., helix caps and strand caps); 3) strong amino acid preferences are revealed at some positions in some SBBs; and 4) distinct patterns of SBBs occur in the “random coil” regions of proteins. Analysis of these patterns identifies interesting structural motifs in the protein backbone structure, indicating that SBBs can be used as “building blocks” in the analysis of protein structure. This type of pattern analysis should increase our understanding of the relationship between protein sequence and local structure, especially in the prediction of protein structures. © 1997 Wiley-Liss, Inc.  相似文献   

18.
How to characterize short protein sequences to make an effective connection to their functions is an unsolved problem. Here we propose to map the physicochemical properties of each amino acid onto unit spheres so that each protein sequence can be represented quantitatively. We demonstrate the usefulness of this representation by applying it to the prediction of cell penetrating peptides. We show that its combination with traditional composition features yields the best performance across different datasets, among several methods compared. For the convenience of users, a web server has been established for automatic calculations of the proposed features at http://biophy.dzu.edu.cn/SNumD/ .  相似文献   

19.
MOTIVATION: In protein chemistry, proteomics and biopharmaceutical development, there is a desire to know not only where a protein is cleaved by a protease, but also the susceptibility of its cleavage sites. The current tools for proteolytic cleavage prediction have often relied purely on regular expressions, or involve models that do not represent biological data well. RESULTS: A novel methodology for characterizing proteolytic cleavage site activities has been developed, which incorporates two fundamental features: activity class prediction and the use of an amino acid similarity matrix for (non-parametric) neural learning. The first solved the problem of predicting proteolytic efficiency. The second significantly improved the robustness in prediction and reduced the time complexity for learning. This study shows that activity class prediction is successful when applying this methodology to the prediction and characterization of Trypsin cleavage sites and the prediction of HIV protease cleavage sites. AVAILABILITY: Requests for software and data should be made respectively to Dr Zheng Rong Yang and Miss Rebecca Thomson.  相似文献   

20.
SUMMARY: Microarray data are generated in complex experiments and frequently compromised by a variety of systematic errors. Subsequent data normalization aims to correct these errors. Although several normalization methods have recently been proposed, they frequently fail to account for the variability of systematic errors within and between microarray experiments. However, optimal adjustment of normalization procedures to the underlying data structure is crucial for the efficiency of normalization. To overcome this restriction of current methods, we have developed two normalization schemes based on iterative local regression combined with model selection. The schemes have been demonstrated to improve considerably the quality of normalization. They are implemented in a freely available R package. Additionally, functions for visualization and detection of systematic errors in microarray data have been incorporated in the software package. A graphical user interface is also available. AVAILABILITY: The R package can be downloaded from http://itb.biologie.hu-berlin.de/~futschik/software/R/OLIN. It underlies the GPL version 2. CONTACT: m.futschik@biologie.hu-berlin.de SUPPLEMENTARY INFORMATION: Further information about the methods used in the OLIN software package can be found at http://itb.biologie.hu-berlin.de/~futschik/software/R/OLIN.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号