首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations. However, instead of examining individual conformations it is in many cases more relevant to analyse ensembles of conformations that have been obtained either through experiments or from methods such as molecular dynamics simulations. We here present three approaches that can be used to compare conformational ensembles in the same way as the root mean square deviation is used to compare individual pairs of structures. The methods are based on the estimation of the probability distributions underlying the ensembles and subsequent comparison of these distributions. We first validate the methods using a synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single-molecule refinement.  相似文献   

2.
Protein structure prediction encompasses two major challenges: 1), the generation of a large ensemble of high resolution structures for a given amino-acid sequence; and 2), the identification of the structure closest to the native structure for a blind prediction. In this article, we address the second challenge, by proposing what is, to our knowledge, a novel iterative traveling-salesman problem-based clustering method to identify the structures of a protein, in a given ensemble, which are closest to the native structure. The method consists of an iterative procedure, which aims at eliminating clusters of structures at each iteration, which are unlikely to be of similar fold to the native, based on a statistical analysis of cluster density and average spherical radius. The method, denoted as ICON, has been tested on four data sets: 1), 1400 proteins with high resolution decoys; 2), medium-to-low resolution decoys from Decoys ‘R’ Us; 3), medium-to-low resolution decoys from the first-principles approach, ASTRO-FOLD; and 4), selected targets from CASP8. The extensive tests demonstrate that ICON can identify high-quality structures in each ensemble, regardless of the resolution of conformers. In a total of 1454 proteins, with an average of 1051 conformers per protein, the conformers selected by ICON are, on an average, in the top 3.5% of the conformers in the ensemble.  相似文献   

3.
4.
Eukaryotic transmembrane helical (TMH) proteins perform a wide diversity of critical cellular functions, but remain structurally largely uncharacterized and their high-resolution structure prediction is currently hindered by the lack of close structural homologues. To address this problem, we present a novel and generic method for accurately modeling large TMH protein structures from distant homologues exhibiting distinct loop and TMH conformations. Models of the adenosine A2AR and chemokine CXCR4 receptors were first ranked in GPCR-DOCK blind prediction contests in the receptor structure accuracy category. In a benchmark of 50 TMH protein homolog pairs of diverse topology (from 5 to 12 TMHs), size (from 183 to 420 residues) and sequence identity (from 15% to 70%), the method improves most starting templates, and achieves near-atomic accuracy prediction of membrane-embedded regions. Unlike starting templates, the models are of suitable quality for computer-based protein engineering: redesigned models and redesigned X-ray structures exhibit very similar native interactions. The method should prove useful for the atom-level modeling and design of a large fraction of structurally uncharacterized TMH proteins from a wide range of structural homologues.  相似文献   

5.
IIntroductionPYOt6lflgy8fTSCtslStfuCtLlf6Withgedffi6tTICMlf-Slffill8Yltyh8sberflStUdi6dfOTiT18flyy6grs,Includingproteinpolypeptldechainfractaldimensions(FD)andtheFDofproteinsurfacest。c-turell-7).Inl%2,Havlinand匝n-Avraham〔8〕studiedthefractaldlmenslonalltyofcommonplymerchainsbasedonaself、a、idingwalks(SAW)medel,andpresentedamethedforcorn-pUt;。prpOfthepl、erCha;nStrUCtUr。,qu。t;一(1)~(2).Allenet。l.(2)一putedp]teinFDbyandassumdfracta]inedelofpr…  相似文献   

6.

Background

Predicting protein function from primary sequence is an important open problem in modern biology. Not only are there many thousands of proteins of unknown function, current approaches for predicting function must be improved upon. One problem in particular is overly-specific function predictions which we address here with a new statistical model of the relationship between protein sequence similarity and protein function similarity.

Methodology

Our statistical model is based on sets of proteins with experimentally validated functions and numeric measures of function specificity and function similarity derived from the Gene Ontology. The model predicts the similarity of function between two proteins given their amino acid sequence similarity measured by statistics from the BLAST sequence alignment algorithm. A novel aspect of our model is that it predicts the degree of function similarity shared between two proteins over a continuous range of sequence similarity, facilitating prediction of function with an appropriate level of specificity.

Significance

Our model shows nearly exact function similarity for proteins with high sequence similarity (bit score >244.7, e-value >1e−62, non-redundant NCBI protein database (NRDB)) and only small likelihood of specific function match for proteins with low sequence similarity (bit score <54.6, e-value <1e−05, NRDB). For sequence similarity ranges in between our annotation model shows an increasing relationship between function similarity and sequence similarity, but with considerable variability. We applied the model to a large set of proteins of unknown function, and predicted functions for thousands of these proteins ranging from general to very specific. We also applied the model to a data set of proteins with previously assigned, specific functions that were electronically based. We show that, on average, these prior function predictions are more specific (quite possibly overly-specific) compared to predictions from our model that is based on proteins with experimentally determined function.  相似文献   

7.

Background

Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods.

Methodology/Principal Findings

A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis.

Conclusions

It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpred_page.php.  相似文献   

8.
Identification of proteins by mass spectrometry (MS) is an essential step in pro- teomic studies and is typically accomplished by either peptide mass fingerprinting (PMF) or amino acid sequencing of the peptide. Although sequence information from MS/MS analysis can be used to validate PMF-based protein identification, it may not be practical when analyzing a large number of proteins and when high- throughput MS/MS instrumentation is not readily available. At present, a vast majority of proteomic studies employ PMF. However, there are huge disparities in criteria used to identify proteins using PMF. Therefore, to reduce incorrect protein identification using PMF, and also to increase confidence in PMF-based protein identification without accompanying MS/MS analysis, definitive guiding principles are essential. To this end, we propose a value-based scoring system that provides guidance on evaluating when PMF-based protein identification can be deemed sufficient without accompanying amino acid sequence data from MS/MS analysis.  相似文献   

9.
Identification of proteins by mass spectrometry (MS) is an essential step in pro- teomic studies and is typically accomplished by either peptide mass fingerprinting (PMF) or amino acid sequencing of the peptide. Although sequence information from MS/MS analysis can be used to validate PMF-based protein identification, it may not be practical when analyzing a large number of proteins and when high- throughput MS/MS instrumentation is not readily available. At present, a vast majority of proteomic studies employ PMF. However, there are huge disparities in criteria used to identify proteins using PMF. Therefore, to reduce incorrect protein identification using PMF, and also to increase confidence in PMF-based protein identification without accompanying MS/MS analysis, definitive guiding principles are essential. To this end, we propose a value-based scoring system that provides guidance on evaluating when PMF-based protein identification can be deemed sufficient without accompanying amino acid sequence data from MS/MS analysis.  相似文献   

10.
随着蛋白质组学研究的蓬勃发展,尤其是近年来对规模化表达谱作图的重视,促使大量蛋白质被鉴定出来,但同时带来了数据质量控制的问题,如何有效去除假阳性结果,提高数据的可靠性,并使得来自不同技术路线、不同质谱仪器以及不同实验室的数据具有可比性是蛋白质组学研究领域的一个热点和难点,文章就目前所使用的数据质量评价方案进行了综述。  相似文献   

11.
In proteins, the polypeptide chain forms a number of right-and left-handed helices and superhelices, right-and left-turned hairpins, and some other structures that are nonsuperimposable, although they are not mirror images of each other as the Lamino acids are not converted to the Damino acids. This property of protein structures will be referred to here as pseudo-chirality–or handedness. It has been shown that there are two kinds of handedness in proteins–helical handedness and handedness of arrangement. Some protein structures exhibit both the kinds of handedness. Handedness is observed at all levels of protein structural organization–from α-helices, β-strands, hairpins, βαβ-units up to complex structural motifs, superhelices, and supramolecular structures in fibrous and polymer proteins. There are several structures that have unique handedness in proteins, for example, α-helices, αα-corners, βαβ-units, abcd-units, and so on. This property of the polypeptide chain is of particular value in protein folding and protein modeling, because it drastically reduces the number of possible folds.  相似文献   

12.
To date all attempts to derive a phyletic relationship among restriction endonucleases (ENases) from multiple sequence alignments have been limited by extreme divergence of these enzymes. Based on the approach of Johnson et al. (1990), I report for the first time the evolutionary tree of the ENase-like protein superfamily inferred from quantitative comparison of atomic coordinates of structurally characterized enzymes. The results presented are in harmony with previous comparisons obtained by crystallographic analyses. It is shown that λ-exonuclease initially diverged from the common ancestor and then two ``endonucleolytic' families branched out, separating ``blunt end cutters' from ``5′ four-base overhang cutters.' These data may contribute to a better understanding of ENases and encourage the use of structure-based methods for inference of phylogenetic relationship among extremely divergent proteins. In addition, the comparison of three-dimensional structures of ENase-like domains provides a platform for further clustering analyses of sequence similarities among different branches of this large protein family, rational choice of homology modeling templates, and targets for protein engineering. Received: 14 June 1999 / Accepted: 11 August 1999  相似文献   

13.
The hydrophobic effect is the main driving force in protein folding. One can estimate the relative strength of this hydrophobic effect for each amino acid by mining a large set of experimentally determined protein structures. However, the hydrophobic force is known to be strongly temperature dependent. This temperature dependence is thought to explain the denaturation of proteins at low temperatures. Here we investigate if it is possible to extract this temperature dependence directly from a large set of protein structures determined at different temperatures. Using NMR structures filtered for sequence identity, we were able to extract hydrophobicity propensities for all amino acids at five different temperature ranges (spanning 265-340 K). These propensities show that the hydrophobicity becomes weaker at lower temperatures, in line with current theory. Alternatively, one can conclude that the temperature dependence of the hydrophobic effect has a measurable influence on protein structures. Moreover, this work provides a method for probing the individual temperature dependence of the different amino acid types, which is difficult to obtain by direct experiment.  相似文献   

14.
  1. Download : Download high-res image (248KB)
  2. Download : Download full-size image
  相似文献   

15.
在机械力学及热力学基础上阐述蛋白质的各种物理性质是从物理学角度理解生物学的基础性工作.详细讨论了蛋白质折迭的不可逆热力学理论,蛋白质动态热力学结构理论.理论推断蛋白质动态热力学变化是一切生物学状态变化的基本热力学状态单位并作为分子生物学变化的分子开关.利用此理论解释了蛋白质的物理学性质及麻醉药的生物物理学机制.  相似文献   

16.
Predicting off-targets by computational methods is getting increasing importance in early drug discovery stages. We herewith present a computational method based on binding site three-dimensional comparisons, which prompted us to investigate the cross-reaction of protein kinase inhibitors with synapsin I, an ATP-binding protein regulating neurotransmitter release in the synapse. Systematic pair-wise comparison of the staurosporine-binding site of the proto-oncogene Pim-1 kinase with 6,412 druggable protein-ligand binding sites suggested that the ATP-binding site of synapsin I may recognize the pan-kinase inhibitor staurosporine. Biochemical validation of this hypothesis was realized by competition experiments of staurosporine with ATP-γ35S for binding to synapsin I. Staurosporine, as well as three other inhibitors of protein kinases (cdk2, Pim-1 and casein kinase type 2), effectively bound to synapsin I with nanomolar affinities and promoted synapsin-induced F-actin bundling. The selective Pim-1 kinase inhibitor quercetagetin was shown to be the most potent synapsin I binder (IC50  = 0.15 µM), in agreement with the predicted binding site similarities between synapsin I and various protein kinases. Other protein kinase inhibitors (protein kinase A and chk1 inhibitor), kinase inhibitors (diacylglycerolkinase inhibitor) and various other ATP-competitors (DNA topoisomerase II and HSP-90α inhibitors) did not bind to synapsin I, as predicted from a lower similarity of their respective ATP-binding sites to that of synapsin I. The present data suggest that the observed downregulation of neurotransmitter release by some but not all protein kinase inhibitors may also be contributed by a direct binding to synapsin I and phosphorylation-independent perturbation of synapsin I function. More generally, the data also demonstrate that cross-reactivity with various targets may be detected by systematic pair-wise similarity measurement of ligand-annotated binding sites.  相似文献   

17.
The refinement of low-quality structures is an important challenge in protein structure prediction. Many studies have been conducted on protein structure refinement; the refinement of structures derived from NMR spectroscopy has been especially intensively studied. In this study, we generated flat-bottom distance potential instead of NOE data because NOE data have ambiguity and uncertainty. The potential was derived from distance information from given structures and prevented structural dislocation during the refinement process. A simulated annealing protocol was used to minimize the potential energy of the structure. The protocol was tested on 134 NMR structures in the Protein Data Bank (PDB) that also have X-ray structures. Among them, 50 structures were used as a training set to find the optimal “width” parameter in the flat-bottom distance potential functions. In the validation set (the other 84 structures), most of the 12 quality assessment scores of the refined structures were significantly improved (total score increased from 1.215 to 2.044). Moreover, the secondary structure similarity of the refined structure was improved over that of the original structure. Finally, we demonstrate that the combination of two energy potentials, statistical torsion angle potential (STAP) and the flat-bottom distance potential, can drive the refinement of NMR structures.  相似文献   

18.
Abstract

The neutral theory of evolution is extended to the origin of protein molecules. Arguments are presented which suggest that the amino acid sequences of many globular proteins mainly represent “memorized” random sequences while biological evolution reduces to the “editing” these random sequences. Physical requirements for a functional globular protein are formulated and it is shown that many of these requirements do not involve strategical selection of amino acid sequences during biological evolution but are inherent also for typical random sequences. In particular, it is shown that random sequences of polar and unpolar amino acid residues can form α-helices and β-strands with lengths and arrangement along the chain similar to those in real globular proteins. These α- and β-regions in random sequences can form three-dimensional folding patterns also similar to those in real proteins. The arguments are presented suggesting that even the tight packing of side groups inside protein core do not require very strong biological selection of amino acid sequences either. Thus many structural features of real proteins can exist also in random sequences and the biological selection is needed mainly for the creation of active sites of proteins and for their stability under physiological conditions.  相似文献   

19.
Coiled coils are protein structure domains with two or more α-helices packed together via interlacing of side chains known as knob-into-hole packing. We analysed and classified a large set of coiled-coil structures using a combination of automated and manual methods. This led to a systematic classification that we termed a “periodic table of coiled coils,” which we have made available at http://coiledcoils.chm.bris.ac.uk/ccplus/search/periodic_table. In this table, coiled-coil assemblies are arranged in columns with increasing numbers of α-helices and in rows of increased complexity. The table provides a framework for understanding possibilities in and limits on coiled-coil structures and a basis for future prediction, engineering and design studies.  相似文献   

20.
In the era of metagenomics and diagnostics sequencing, the importance of protein comparison methods of boosted performance cannot be overstated. Here we present PSimScan (Protein Similarity Scanner), a flexible open source protein similarity search tool which provides a significant gain in speed compared to BLASTP at the price of controlled sensitivity loss. The PSimScan algorithm introduces a number of novel performance optimization methods that can be further used by the community to improve the speed and lower hardware requirements of bioinformatics software. The optimization starts at the lookup table construction, then the initial lookup table–based hits are passed through a pipeline of filtering and aggregation routines of increasing computational complexity. The first step in this pipeline is a novel algorithm that builds and selects ‘similarity zones’ aggregated from neighboring matches on small arrays of adjacent diagonals. PSimScan performs 5 to 100 times faster than the standard NCBI BLASTP, depending on chosen parameters, and runs on commodity hardware. Its sensitivity and selectivity at the slowest settings are comparable to the NCBI BLASTP’s and decrease with the increase of speed, yet stay at the levels reasonable for many tasks. PSimScan is most advantageous when used on large collections of query sequences. Comparing the entire proteome of Streptocuccus pneumoniae (2,042 proteins) to the NCBI’s non-redundant protein database of 16,971,855 records takes 6.5 hours on a moderately powerful PC, while the same task with the NCBI BLASTP takes over 66 hours. We describe innovations in the PSimScan algorithm in considerable detail to encourage bioinformaticians to improve on the tool and to use the innovations in their own software development.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号