首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Protein structure prediction in the postgenomic era   总被引:3,自引:0,他引:3  
As the number of completely sequenced genomes rapidly increases, the postgenomic problem of gene function identification becomes ever more pressing. Predicting the structures of proteins encoded by genes of interest is one possible means to glean subtle clues as to the functions of these proteins. There are limitations to this approach to gene identification and a survey of the expected reliability of different protein structure prediction techniques has been undertaken.  相似文献   

2.
Significant advances have been achieved in protein structure prediction, especially with the recent development of the AlphaFold2 and the RoseTTAFold systems. This article reviews the progress in deep learning-based protein structure prediction methods in the past two years. First, we divide the representative methods into two categories: the two-step approach and the end-to-end approach. Then, we show that the two-step approach is possible to achieve similar accuracy to the state-of-the-art end-to-end approach AlphaFold2. Compared to the end-to-end approach, the two-step approach requires fewer computing resources. We conclude that it is valuable to keep developing both approaches. Finally, a few outstanding challenges in function-orientated protein structure prediction are pointed out for future development.  相似文献   

3.
Protein complexes carry out almost the entire signaling and functional processes in the cell. The protein complex complement of a cell, and its network of complex–complex interactions, is referred to here as the complexome. Computational methods to predict protein complexes from proteomics data, resulting in network representations of complexomes, have recently being developed. In addition, key advances have been made toward understanding the network and structural organization of complexomes. We review these bioinformatics advances, and their discovery‐potential, as well as the merits of integrating proteomics data with emerging methods in systems biology to study protein complex signaling. It is envisioned that improved integration of proteomics and systems biology, incorporating the dynamics of protein complexes in space and time, may lead to more predictive models of cell signaling networks for effective modulation.  相似文献   

4.
J M Chandonia  M Karplus 《Proteins》1999,35(3):293-306
A primary and a secondary neural network are applied to secondary structure and structural class prediction for a database of 681 non-homologous protein chains. A new method of decoding the outputs of the secondary structure prediction network is used to produce an estimate of the probability of finding each type of secondary structure at every position in the sequence. In addition to providing a reliable estimate of the accuracy of the predictions, this method gives a more accurate Q3 (74.6%) than the cutoff method which is commonly used. Use of these predictions in jury methods improves the Q3 to 74.8%, the best available at present. On a database of 126 proteins commonly used for comparison of prediction methods, the jury predictions are 76.6% accurate. An estimate of the overall Q3 for a given sequence is made by averaging the estimated accuracy of the prediction over all residues in the sequence. As an example, the analysis is applied to the target beta-cryptogein, which was a difficult target for ab initio predictions in the CASP2 study; it shows that the prediction made with the present method (62% of residues correct) is close to the expected accuracy (66%) for this protein. The larger database and use of a new network training protocol also improve structural class prediction accuracy to 86%, relative to 80% obtained previously. Secondary structure content is predicted with accuracy comparable to that obtained with spectroscopic methods, such as vibrational or electronic circular dichroism and Fourier transform infrared spectroscopy.  相似文献   

5.
The gap between the number of protein sequences and protein structures is increasing rapidly, exacerbated by the completion of numerous genome projects now flooding into public databases. To fill this gap, comparative protein modelling is widely considered the most accurate technique for predicting the three-dimensional shape of proteins. High-throughput, automatic protein modelling should considerably increase our access to protein structures other than those determined by experimental techniques such as X-ray crystallography and NMR (nuclear magnetic resonance) spectroscopy. The uses for these complete three-dimensional models are growing rapidly, ranging from guiding site-directed mutagenesis experiments to protein-protein interaction predictions. In recognition of this, a number of very useful comparative modelling servers have begun to emerge on the Web. Molecular biologists now have a powerful web-based toolkit to construct models, assess their accuracy, and use them to explain and predict experiments. There is, however, still much to do by those engaged in algorithmic development if comparative modelling is to compete on an equal footing with experimental protein structure determination techniques.  相似文献   

6.
7.
For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at https://ufold.ics.uci.edu. Code is available at https://github.com/uci-cbcl/UFold.  相似文献   

8.
SUMMARY: Porter is a new system for protein secondary structure prediction in three classes. Porter relies on bidirectional recurrent neural networks with shortcut connections, accurate coding of input profiles obtained from multiple sequence alignments, second stage filtering by recurrent neural networks, incorporation of long range information and large-scale ensembles of predictors. Porter's accuracy, tested by rigorous 5-fold cross-validation on a large set of proteins, exceeds 79%, significantly above a copy of the state-of-the-art SSpro server, better than any system published to date. AVAILABILITY: Porter is available as a public web server at http://distill.ucd.ie/porter/ CONTACT: gianluca.pollastri@ucd.ie.  相似文献   

9.
A combination of selective 1H nuclear Overhauser effects and other evidence indicates that the headpiece of the lac-repressor protein folds back on itself “head-to-tail” with residues in the N-terminal and C-terminal portions near to each other.  相似文献   

10.

Background  

Caspases are a family of proteases that have central functions in programmed cell death (apoptosis) and inflammation. Caspases mediate their effects through aspartate-specific cleavage of their target proteins, and at present almost 400 caspase substrates are known. There are several methods developed to predict caspase cleavage sites from individual proteins, but currently none of them can be used to predict caspase cleavage sites from multiple proteins or entire proteomes, or to use several classifiers in combination. The possibility to create a database from predicted caspase cleavage products for the whole genome could significantly aid in identifying novel caspase targets from tandem mass spectrometry based proteomic experiments.  相似文献   

11.
12.
  1. Download : Download high-res image (147KB)
  2. Download : Download full-size image
  相似文献   

13.
A protein secondary structure prediction method from multiply aligned homologous sequences is presented with an overall per residue three-state accuracy of 70.1%. There are two aims: to obtain high accuracy by identification of a set of concepts important for prediction followed by use of linear statistics; and to provide insight into the folding process. The important concepts in secondary structure prediction are identified as: residue conformational propensities, sequence edge effects, moments of hydrophobicity, position of insertions and deletions in aligned homologous sequence, moments of conservation, auto-correlation, residue ratios, secondary structure feedback effects, and filtering. Explicit use of edge effects, moments of conservation, and auto-correlation are new to this paper. The relative importance of the concepts used in prediction was analyzed by stepwise addition of information and examination of weights in the discrimination function. The simple and explicit structure of the prediction allows the method to be reimplemented easily. The accuracy of a prediction is predictable a priori. This permits evaluation of the utility of the prediction: 10% of the chains predicted were identified correctly as having a mean accuracy of > 80%. Existing high-accuracy prediction methods are "black-box" predictors based on complex nonlinear statistics (e.g., neural networks in PHD: Rost & Sander, 1993a). For medium- to short-length chains (> or = 90 residues and < 170 residues), the prediction method is significantly more accurate (P < 0.01) than the PHD algorithm (probably the most commonly used algorithm). In combination with the PHD, an algorithm is formed that is significantly more accurate than either method, with an estimated overall three-state accuracy of 72.4%, the highest accuracy reported for any prediction method.  相似文献   

14.
The major aim of tertiary structure prediction is to obtain protein models with the highest possible accuracy. Fold recognition, homology modeling, and de novo prediction methods typically use predicted secondary structures as input, and all of these methods may significantly benefit from more accurate secondary structure predictions. Although there are many different secondary structure prediction methods available in the literature, their cross-validated prediction accuracy is generally <80%. In order to increase the prediction accuracy, we developed a novel hybrid algorithm called Consensus Data Mining (CDM) that combines our two previous successful methods: (1) Fragment Database Mining (FDM), which exploits the Protein Data Bank structures, and (2) GOR V, which is based on information theory, Bayesian statistics, and multiple sequence alignments (MSA). In CDM, the target sequence is dissected into smaller fragments that are compared with fragments obtained from related sequences in the PDB. For fragments with a sequence identity above a certain sequence identity threshold, the FDM method is applied for the prediction. The remainder of the fragments are predicted by GOR V. The results of the CDM are provided as a function of the upper sequence identities of aligned fragments and the sequence identity threshold. We observe that the value 50% is the optimum sequence identity threshold, and that the accuracy of the CDM method measured by Q(3) ranges from 67.5% to 93.2%, depending on the availability of known structural fragments with sufficiently high sequence identity. As the Protein Data Bank grows, it is anticipated that this consensus method will improve because it will rely more upon the structural fragments.  相似文献   

15.
16.
17.
18.
On the structure of the folded chromosome of Escherichia coli   总被引:131,自引:0,他引:131  
  相似文献   

19.
Protein structure prediction has great potential of understanding the function of proteins at the molecular level and designing novel protein functions. Here, we report rapid and accurate structure prediction system running in an automated manner. Since fold recognition of the target protein to be modeled is the starting point of the template-guided model building process, various approaches – such as profile analysis, threading, and SCOP fold classification – have been applied to generate the template library and to select the best template structure. After the best template was determined, fold consistency within the template candidates was considered using TM-score and SCOP database to select additional template structures among the template library. To generate a total of 100 decoy sets, MODELLER was used with the selected template structure. The predicted decoys were clustered with the RMSD deviation criterion of 3 Å to obtain centroids from each cluster. Finally, the selected centroids were subject to side-chain rearrangement using SCWRL module. Our fully automated structure prediction system was examined with sample test sets consisting of recently released 80 PDB chains. Judged by the TM-score (≥0.4), we concluded that 60 cases (75%) showed similar structures of statistical significance. This prediction system provides the users with simple and reliable models within hours of query submission, so that it is quite simply used for high throughput enzyme screening.  相似文献   

20.
A software system, SOSUI, was previously developed for discriminating between soluble and membrane proteins and predicting transmembrane regions (Hirokawa et al., Bioinformatics, 14 (1998) 378-379). The performance of the system was 99% for the discrimination between two types of proteins and 96% for the prediction of transmembrane helices. When all of the amino acid sequences from 15 single-cell organisms were analyzed by SOSUI, the proportion of predicted polytopic membrane proteins showed an almost constant value of 15-20%, irrespective of the total genome size. However, single-cell organisms appeared to be categorized in terms of the preference of the number of transmembrane segments: species with small genomes were characterized by a significant peak at a helix number of approximately six or seven; species with large genomes showed a peak at 10 or 11 helices; and species with intermediate genome sizes showed a monotonous decrease of the population of membrane proteins against the number of transmembrane helices.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号