首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Hidden Markov Models (HMMs) are practical tools which provide probabilistic base for protein secondary structure prediction. In these models, usually, only the information of the left hand side of an amino acid is considered. Accordingly, these models seem to be inefficient with respect to long range correlations. In this work we discuss a Segmental Semi Markov Model (SSMM) in which the information of both sides of amino acids are considered. It is assumed and seemed reasonable that the information on both sides of an amino acid can provide a suitable tool for measuring dependencies. We consider these dependencies by dividing them into shorter dependencies. Each of these dependency models can be applied for estimating the probability of segments in structural classes. Several conditional probabilities concerning dependency of an amino acid to the residues appeared on its both sides are considered. Based on these conditional probabilities a weighted model is obtained to calculate the probability of each segment in a structure. This results in 2.27% increase in prediction accuracy in comparison with the ordinary Segmental Semi Markov Models, SSMMs. We also compare the performance of our model with that of the Segmental Semi Markov Model introduced by Schmidler et al. [C.S. Schmidler, J.S. Liu, D.L. Brutlag, Bayesian segmentation of protein secondary structure, J. Comp. Biol. 7(1/2) (2000) 233-248]. The calculations show that the overall prediction accuracy of our model is higher than the SSMM introduced by Schmidler.  相似文献   

2.
In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. The segmental model is a generalization of the hidden Markov model where a hidden state generates segments of various length and secondary structure type. A novel parameterized model is proposed for the likelihood function that explicitly represents multiple sequence alignment profiles to capture the segmental conformation. Numerical results on benchmark data sets show that incorporating the profiles results in substantial improvements and the generalization performance is promising. By incorporating the information from long range interactions in /spl beta/-sheets, this model is also capable of carrying out inference on contact maps. This is an important advantage of probabilistic generative models over the traditional discriminative approach to protein secondary structure prediction. The Web server of our algorithm and supplementary materials are available at http://public.kgi.edu/-wild/bsm.html.  相似文献   

3.
邹应斌  米湘成  石纪成 《生态学报》2004,24(12):2967-2972
研究利用人工神经网络模型 ,以水稻群体分蘖动态为例 ,采用交互验证和独立验证的方式 ,对水稻生长 BP网络模型进行了训练与模拟 ,其结果与水稻群体分蘖的积温统计模型、基本动力学模型和复合分蘖模型进行了比较。研究结果表明 ,神经网络模型具有一定的外推能力 ,但其外推能力依赖于大量的训练样本。神经网络模型具有较好的拟合能力 ,是因为有较多的模型参数 ,因此对神经网络模型的训练需要大量的参数来保证其参数不致过度吻合。具有外推能力神经网络模型的最少训练样本数应大于 6 .75倍于神经网络参数数目 ,小于 13.5倍于神经网络参数数目。因此在应用神经网络模型时 ,如果神经网络模型包括较多的输入变量时 ,可考虑采用主成分分析、对应分析等技术对输入变量进行信息综合 ,相应地减少网络模型的参数。另一方面 ,当训练样本不足时 ,最好只用神经网络模型对同一系统的情况进行模拟 ,应谨慎使用神经网络模型进行外推。神经网络模型给作物模拟研究的科学工作者提供了一个“傻瓜”式工具 ,对数学建模不熟悉的农业研究人员 ,人工神经网络可以替代数学建模进行仿真实验 ;对于精通数学建模的研究人员来说 ,它至少是一种补充和可作为比较的非线性数据处理方法  相似文献   

4.
Using evolutionary information contained in multiple sequence alignments as input to neural networks, secondary structure can be predicted at significantly increased accuracy. Here, we extend our previous three-level system of neural networks by using additional input information derived from multiple alignments. Using a position-specific conservation weight as part of the input increases performance. Using the number of insertions and deletions reduces the tendency for overprediction and increases overall accuracy. Addition of the global amino acid content yields a further improvement, mainly in predicting structural class. The final network system has a sustained overall accuracy of 71.6% in a multiple cross-validation test on 126 unique protein chains. A test on a new set of 124 recently solved protein structures that have no significant sequence similarity to the learning set confirms the high level of accuracy. The average cross-validated accuracy for all 250 sequence-unique chains is above 72%. Using various data sets, the method is compared to alternative prediction methods, some of which also use multiple alignments: the performance advantage of the network system is at least 6 percentage points in three-state accuracy. In addition, the network estimates secondary structure content from multiple sequence alignments about as well as circular dichroism spectroscopy on a single protein and classifies 75% of the 250 proteins correctly into one of four protein structural classes. Of particular practical importance is the definition of a position-specific reliability index. For 40% of all residues the method has a sustained three-state accuracy of 88%, as high as the overall average for homology modelling. A further strength of the method is greatly increased accuracy in predicting the placement of secondary structure segments. © 1994 Wiley-Liss, Inc.  相似文献   

5.
Characterizing and classifying regularities in protein structure is an important element in uncovering the mechanisms that regulate protein structure, function and evolution. Recent research concentrates on analysis of structural motifs that can be used to describe larger, fold-sized structures based on homologous primary sequences. At the same time, accuracy of secondary protein structure prediction based on multiple sequence alignment drops significantly when low homology (twilight zone) sequences are considered. To this end, this paper addresses a problem of providing an alternative sequences representation that would improve ability to distinguish secondary structures for the twilight zone sequences without using alignment. We consider a novel classification problem, in which, structural motifs, referred to as structural fragments (SFs) are defined as uniform strand, helix and coil fragments. Classification of SFs allows to design novel sequence representations, and to investigate which other factors and prediction algorithms may result in the improved discrimination. Comprehensive experimental results show that statistically significant improvement in classification accuracy can be achieved by: (1) improving sequence representations, and (2) removing possible noise on the terminal residues in the SFs. Combining these two approaches reduces the error rate on average by 15% when compared to classification using standard representation and noisy information on the terminal residues, bringing the classification accuracy to over 70%. Finally, we show that certain prediction algorithms, such as neural networks and boosted decision trees, are superior to other algorithms.This research was supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC).  相似文献   

6.
7.
Protein chemical shifts encode detailed structural information that is difficult and computationally costly to describe at a fundamental level. Statistical and machine learning approaches have been used to infer correlations between chemical shifts and secondary structure from experimental chemical shifts. These methods range from simple statistics such as the chemical shift index to complex methods using neural networks. Notwithstanding their higher accuracy, more complex approaches tend to obscure the relationship between secondary structure and chemical shift and often involve many parameters that need to be trained. We present hidden Markov models (HMMs) with Gaussian emission probabilities to model the dependence between protein chemical shifts and secondary structure. The continuous emission probabilities are modeled as conditional probabilities for a given amino acid and secondary structure type. Using these distributions as outputs of first‐ and second‐order HMMs, we achieve a prediction accuracy of 82.3%, which is competitive with existing methods for predicting secondary structure from protein chemical shifts. Incorporation of sequence‐based secondary structure prediction into our HMM improves the prediction accuracy to 84.0%. Our findings suggest that an HMM with correlated Gaussian distributions conditioned on the secondary structure provides an adequate generative model of chemical shifts. Proteins 2013; © 2012 Wiley Periodicals, Inc.  相似文献   

8.
SUMMARY: Porter is a new system for protein secondary structure prediction in three classes. Porter relies on bidirectional recurrent neural networks with shortcut connections, accurate coding of input profiles obtained from multiple sequence alignments, second stage filtering by recurrent neural networks, incorporation of long range information and large-scale ensembles of predictors. Porter's accuracy, tested by rigorous 5-fold cross-validation on a large set of proteins, exceeds 79%, significantly above a copy of the state-of-the-art SSpro server, better than any system published to date. AVAILABILITY: Porter is available as a public web server at http://distill.ucd.ie/porter/ CONTACT: gianluca.pollastri@ucd.ie.  相似文献   

9.
A priori knowledge of secondary structure content can be of great use in theoretical and experimental determination of protein structure. We present a method that uses two computer-simulated neural networks placed in "tandem" to predict the secondary structure content of water-soluble, globular proteins. The first of the two networks, NET1, predicts a protein's helix and strand content given information about the protein's amino acid composition, molecular weight and heme presence. Because NET1 contained more adjustable parameters (network weights) than learning examples, this network experienced problems with memorization, which is the inability to generalize onto new, never-seen-before examples. To overcome this problem, we designed a second network, NET2, which learned to determine when NET1 was in a state of generalization. Together, these two networks produce prediction errors as low as 5.0% and 5.6% for helix and strand content, respectively, on a set of protein crystal structures bearing little homology to those used in network training. A comparison between three other methods including a multiple linear regression analysis, a non-hidden-node network analysis and a secondary structure assignment analysis reveals that our tandem neural network scheme is, indeed, the best method for predicting secondary structure content. The results of our analysis suggest that the knowledge of sequence information is not necessary for highly accurate predictions of protein secondary structure content.  相似文献   

10.
Computational neural networks have recently been used to predict the mapping between protein sequence and secondary structure. They have proven adequate for determining the first-order dependence between these two sets, but have, until now, been unable to garner higher-order information that helps determine secondary structure. By adding neural network units that detect periodicities in the input sequence, we have modestly increased the secondary structure prediction accuracy. The use of tertiary structural class causes a marked increase in accuracy. The best case prediction was 79% for the class of all-alpha proteins. A scheme for employing neural networks to validate and refine structural hypotheses is proposed. The operational difficulties of applying a learning algorithm to a dataset where sequence heterogeneity is under-represented and where local and global effects are inadequately partitioned are discussed.  相似文献   

11.
Kaur H  Raghava GP 《FEBS letters》2004,564(1-2):47-57
In this study, an attempt has been made to develop a neural network-based method for predicting segments in proteins containing aromatic-backbone NH (Ar-NH) interactions using multiple sequence alignment. We have analyzed 3121 segments seven residues long containing Ar-NH interactions, extracted from 2298 non-redundant protein structures where no two proteins have more than 25% sequence identity. Two consecutive feed-forward neural networks with a single hidden layer have been trained with standard back-propagation as learning algorithm. The performance of the method improves from 0.12 to 0.15 in terms of Matthews correlation coefficient (MCC) value when evolutionary information (multiple alignment obtained from PSI-BLAST) is used as input instead of a single sequence. The performance of the method further improves from MCC 0.15 to 0.20 when secondary structure information predicted by PSIPRED is incorporated in the prediction. The final network yields an overall prediction accuracy of 70.1% and an MCC of 0.20 when tested by five-fold cross-validation. Overall the performance is 15.2% higher than the random prediction. The method consists of two neural networks: (i) a sequence-to-structure network which predicts the aromatic residues involved in Ar-NH interaction from multiple alignment of protein sequences and (ii) a structure-to structure network where the input consists of the output obtained from the first network and predicted secondary structure. Further, the actual position of the donor residue within the 'potential' predicted fragment has been predicted using a separate sequence-to-structure neural network. Based on the present study, a server Ar_NHPred has been developed which predicts Ar-NH interaction in a given amino acid sequence. The web server Ar_NHPred is available at and (mirror site).  相似文献   

12.
This study investigated the use of neural networks in the identification of Escherichia coli ribosome binding sites. The recognition of these sites based on primary sequence data is difficult due to the multiple determinants that define them. Additionally, secondary structure plays a significant role in the determination of the site and this information is difficult to include in the models. Efforts to solve this problem have so far yielded poor results. A new compilation of E. coli ribosome binding sites was generated for this study. Feedforward backpropagation networks were applied to their identification. Perceptrons were also applied, since they have been the previous best method since 1982. Evaluation of performance for all the neural networks and perceptrons was determined by ROC analysis. The neural network provided significant improvement in the recognition of these sites when compared with the previous best method, finding less than half the number of false positives when both models were adjusted to find an equal number of actual sites. The best neural network used an input window of 101 nucleotides and a single hidden layer of 9 units. Both the neural network and the perceptron trained on the new compilation performed better than the original perceptron published by Stormo et al. in 1982.  相似文献   

13.
To improve recognition results, decisions of multiple neural networks can be aggregated into a committee decision. In contrast to the ordinary approach of utilizing all neural networks available to make a committee decision, we propose creating adaptive committees, which are specific for each input data point. A prediction network is used to identify classification neural networks to be fused for making a committee decision about a given input data point. The jth output value of the prediction network expresses the expectation level that the jth classification neural network will make a correct decision about the class label of a given input data point. The proposed technique is tested in three aggregation schemes, namely majority vote, averaging, and aggregation by the median rule and compared with the ordinary neural networks fusion approach. The effectiveness of the approach is demonstrated on two artificial and three real data sets.  相似文献   

14.
Kaur H  Raghava GP 《Proteins》2004,55(1):83-90
In this paper a systematic attempt has been made to develop a better method for predicting alpha-turns in proteins. Most of the commonly used approaches in the field of protein structure prediction have been tried in this study, which includes statistical approach "Sequence Coupled Model" and machine learning approaches; i) artificial neural network (ANN); ii) Weka (Waikato Environment for Knowledge Analysis) Classifiers and iii) Parallel Exemplar Based Learning (PEBLS). We have also used multiple sequence alignment obtained from PSIBLAST and secondary structure information predicted by PSIPRED. The training and testing of all methods has been performed on a data set of 193 non-homologous protein X-ray structures using five-fold cross-validation. It has been observed that ANN with multiple sequence alignment and predicted secondary structure information outperforms other methods. Based on our observations we have developed an ANN-based method for predicting alpha-turns in proteins. The main components of the method are two feed-forward back-propagation networks with a single hidden layer. The first sequence-structure network is trained with the multiple sequence alignment in the form of PSI-BLAST-generated position specific scoring matrices. The initial predictions obtained from the first network and PSIPRED predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. The final network yields an overall prediction accuracy of 78.0% and MCC of 0.16. A web server AlphaPred (http://www.imtech.res.in/raghava/alphapred/) has been developed based on this approach.  相似文献   

15.
Prediction of transmembrane spans and secondary structure from the protein sequence is generally the first step in the structural characterization of (membrane) proteins. Preference of a stretch of amino acids in a protein to form secondary structure and being placed in the membrane are correlated. Nevertheless, current methods predict either secondary structure or individual transmembrane states. We introduce a method that simultaneously predicts the secondary structure and transmembrane spans from the protein sequence. This approach not only eliminates the necessity to create a consensus prediction from possibly contradicting outputs of several predictors but bears the potential to predict conformational switches, i.e., sequence regions that have a high probability to change for example from a coil conformation in solution to an α‐helical transmembrane state. An artificial neural network was trained on databases of 177 membrane proteins and 6048 soluble proteins. The output is a 3 × 3 dimensional probability matrix for each residue in the sequence that combines three secondary structure types (helix, strand, coil) and three environment types (membrane core, interface, solution). The prediction accuracies are 70.3% for nine possible states, 73.2% for three‐state secondary structure prediction, and 94.8% for three‐state transmembrane span prediction. These accuracies are comparable to state‐of‐the‐art predictors of secondary structure (e.g., Psipred) or transmembrane placement (e.g., OCTOPUS). The method is available as web server and for download at www.meilerlab.org . Proteins 2013; 81:1127–1140. © 2013 Wiley Periodicals, Inc.  相似文献   

16.
Chao Fang  Yi Shang  Dong Xu 《Proteins》2018,86(5):592-598
Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception‐inside‐inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD‐SS. The input to MUFOLD‐SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio‐chemical properties of amino acids, PSI‐BLAST profile, and HHBlits profile. MUFOLD‐SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD‐SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD‐SS outperformed the best existing methods and other deep neural networks significantly. MUFold‐SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html .  相似文献   

17.
Molecular evolution may be considered as a walk in a multidimensional fitness landscape, where the fitness at each point is associated with features such as the function, stability, and survivability of these molecules. We present a simple model for the evolution of protein sequences on a landscape with a precisely defined fitness function. We use simple lattice models to represent protein structures, with the ability of a protein sequence to fold into the structure with lowest energy, quantified as the foldability, representing the fitness of the sequence. The foldability of the sequence is characterized based on the spin glass model of protein folding. We consider evolution as a walk in this foldability landscape and study the nature of the landscape and the resulting dynamics. Selective pressure is explicitly included in this model in the form of a minimum foldability requirement. We find that different native structures are not evenly distributed in interaction space, with similar structures and structures with similar optimal foldabilities clustered together. Evolving proteins marginally fulfill the selective criteria of foldability. As the selective pressure is increased, evolutionary trajectories become increasingly confined to “neutral networks,” where the sequence and the interactions can be significantly changed while a constant structure is maintained. © 1997 John Wiley & Sons, Inc. Biopoly 42: 427–438, 1997  相似文献   

18.
Environmental DNA (eDNA) metabarcoding provides an efficient approach for documenting biodiversity patterns in marine and terrestrial ecosystems. The complexity of these data prevents current methods from extracting and analyzing all the relevant ecological information they contain, and new methods may provide better dimensionality reduction and clustering. Here we present two new deep learning-based methods that combine different types of neural networks (NNs) to ordinate eDNA samples and visualize ecosystem properties in a two-dimensional space: the first is based on variational autoencoders and the second on deep metric learning. The strength of our new methods lies in the combination of two inputs: the number of sequences found for each molecular operational taxonomic unit (MOTU) detected and their corresponding nucleotide sequence. Using three different datasets, we show that our methods accurately represent several biodiversity indicators in a two-dimensional latent space: MOTU richness per sample, sequence α-diversity per sample, Jaccard's and sequence β-diversity between samples. We show that our nonlinear methods are better at extracting features from eDNA datasets while avoiding the major biases associated with eDNA. Our methods outperform traditional dimension reduction methods such as Principal Component Analysis, t-distributed Stochastic Neighbour Embedding, Nonmetric Multidimensional Scaling and Uniform Manifold Approximation and Projection for dimension reduction. Our results suggest that NNs provide a more efficient way of extracting structure from eDNA metabarcoding data, thereby improving their ecological interpretation and thus biodiversity monitoring.  相似文献   

19.
20.
Neural networks are increasingly being used in science to infer hidden dynamics of natural systems from noisy observations, a task typically handled by hierarchical models in ecology. This article describes a class of hierarchical models parameterised by neural networks – neural hierarchical models. The derivation of such models analogises the relationship between regression and neural networks. A case study is developed for a neural dynamic occupancy model of North American bird populations, trained on millions of detection/non‐detection time series for hundreds of species, providing insights into colonisation and extinction at a continental scale. Flexible models are increasingly needed that scale to large data and represent ecological processes. Neural hierarchical models satisfy this need, providing a bridge between deep learning and ecological modelling that combines the function representation power of neural networks with the inferential capacity of hierarchical models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号