共查询到20条相似文献,搜索用时 0 毫秒
1.
The rapid accumulation of gene sequences, many of which are hypothetical proteins with unknown function, has stimulated the development of accurate computational tools for protein function prediction with evolution/structure‐based approaches showing considerable promise. In this article, we present FINDSITE‐metal, a new threading‐based method designed specifically to detect metal‐binding sites in modeled protein structures. Comprehensive benchmarks using different quality protein structures show that weakly homologous protein models provide sufficient structural information for quite accurate annotation by FINDSITE‐metal. Combining structure/evolutionary information with machine learning results in highly accurate metal‐binding annotations; for protein models constructed by TASSER, whose average Cα RMSD from the native structure is 8.9 Å, 59.5% (71.9%) of the best of top five predicted metal locations are within 4 Å (8 Å) from a bound metal in the crystal structure. For most of the targets, multiple metal‐binding sites are detected with the best predicted binding site at rank 1 and within the top two ranks in 65.6% and 83.1% of the cases, respectively. Furthermore, for iron, copper, zinc, calcium, and magnesium ions, the binding metal can be predicted with high, typically 70% to 90%, accuracy. FINDSITE‐metal also provides a set of confidence indexes that help assess the reliability of predictions. Finally, we describe the proteome‐wide application of FINDSITE‐metal that quantifies the metal‐binding complement of the human proteome. FINDSITE‐metal is freely available to the academic community at http://cssb.biology.gatech.edu/findsite‐metal/ . Proteins 2011. © 2010 Wiley‐Liss, Inc. 相似文献
2.
MOTIVATION: Protein backbone torsion angle prediction provides useful local structural information that goes beyond conventional three-state (alpha, beta and coil) secondary structure predictions. Accurate prediction of protein backbone torsion angles will substantially improve modeling procedures for local structures of protein sequence segments, especially in modeling loop conformations that do not form regular structures as in alpha-helices or beta-strands. RESULTS: We have devised two novel automated methods in protein backbone conformational state prediction: one method is based on support vector machines (SVMs); the other method combines a standard feed-forward back-propagation artificial neural network (NN) with a local structure-based sequence profile database (LSBSP1). Extensive benchmark experiments demonstrate that both methods have improved the prediction accuracy rate over the previously published methods for conformation state prediction when using an alphabet of three or four states. AVAILABILITY: LSBSP1 and the NN algorithm have been implemented in PrISM.1, which is available from www.columbia.edu/~ay1/. SUPPLEMENTARY INFORMATION: Supplementary data for the SVM method can be downloaded from the Website www.cs.columbia.edu/compbio/backbone. 相似文献
3.
4.
The identification and annotation of protein domains provides a critical step in the accurate determination of molecular function. Both computational and experimental methods of protein structure determination may be deterred by large multi-domain proteins or flexible linker regions. Knowledge of domains and their boundaries may reduce the experimental cost of protein structure determination by allowing researchers to work on a set of smaller and possibly more successful alternatives. Current domain prediction methods often rely on sequence similarity to conserved domains and as such are poorly suited to detect domain structure in poorly conserved or orphan proteins. We present here a simple computational method to identify protein domain linkers and their boundaries from sequence information alone. Our domain predictor, Armadillo (http://armadillo.blueprint.org), uses any amino acid index to convert a protein sequence to a smoothed numeric profile from which domains and domain boundaries may be predicted. We derived an amino acid index called the domain linker propensity index (DLI) from the amino acid composition of domain linkers using a non-redundant structure dataset. The index indicates that Pro and Gly show a propensity for linker residues while small hydrophobic residues do not. Armadillo predicts domain linker boundaries from Z-score distributions and obtains 35% sensitivity with DLI in a two-domain, single-linker dataset (within +/-20 residues from linker). The combination of DLI and an entropy-based amino acid index increases the overall Armadillo sensitivity to 56% for two domain proteins. Moreover, Armadillo achieves 37% sensitivity for multi-domain proteins, surpassing most other prediction methods. Armadillo provides a simple, but effective method by which prediction of domain boundaries can be obtained with reasonable sensitivity. Armadillo should prove to be a valuable tool for rapidly delineating protein domains in poorly conserved proteins or those with no sequence neighbors. As a first-line predictor, domain meta-predictors could yield improved results with Armadillo predictions. 相似文献
5.
6.
Joe DeBartolo Glen Hocky Michael Wilde Jinbo Xu Karl F. Freed Tobin R. Sosnick 《Protein science : a publication of the Protein Society》2010,19(3):520-534
For naturally occurring proteins, similar sequence implies similar structure. Consequently, multiple sequence alignments (MSAs) often are used in template‐based modeling of protein structure and have been incorporated into fragment‐based assembly methods. Our previous homology‐free structure prediction study introduced an algorithm that mimics the folding pathway by coupling the formation of secondary and tertiary structure. Moves in the Monte Carlo procedure involve only a change in a single pair of ?,ψ backbone dihedral angles that are obtained from a Protein Data Bank‐based distribution appropriate for each amino acid, conditional on the type and conformation of the flanking residues. We improve this method by using MSAs to enrich the sampling distribution, but in a manner that does not require structural knowledge of any protein sequence (i.e., not homologous fragment insertion). In combination with other tools, including clustering and refinement, the accuracies of the predicted secondary and tertiary structures are substantially improved and a global and position‐resolved measure of confidence is introduced for the accuracy of the predictions. Performance of the method in the Critical Assessment of Structure Prediction (CASP8) is discussed. 相似文献
7.
《Genomics》2022,114(5):110454
Cis-regulatory elements (CREs) are non-coding parts of the genome that play a critical role in gene expression regulation. Enhancers, as an important example of CREs, interact with genes to influence complex traits like disease, heat tolerance and growth rate. Much of what is known about enhancers come from studies of humans and a few model organisms like mouse, with little known about other mammalian species. Previous studies have attempted to identify enhancers in less studied mammals using comparative genomics but with limited success. Recently, Machine Learning (ML) techniques have shown promising results to predict enhancer regions. Here, we investigated the ability of ML methods to identify enhancers in three non-model mammalian species (cattle, pig and dog) using human and mouse enhancer data from VISTA and publicly available ChIP-seq. We tested nine models, using four different representations of the DNA sequences in cross-species prediction using both the VISTA dataset and species-specific ChIP-seq data. We identified between 809,399 and 877,278 enhancer-like regions (ELRs) in the study species (11.6–13.7% of each genome). These predictions were close to the ~8% proportion of ELRs that covered the human genome. We propose that our ML methods have predictive ability for identifying enhancers in non-model mammalian species. We have provided a list of high confidence enhancers at https://github.com/DaviesCentreInformatics/Cross-species-enhancer-prediction and believe these enhancers will be of great use to the community. 相似文献
8.
G protein-coupled receptors (GPCRs) are part of multi-protein networks called ‘receptosomes’. These GPCR interacting proteins
(GIPs) in the receptosomes control the targeting, trafficking and signaling of GPCRs. PDZ domain proteins constitute the largest
protein family among the GIPs, and the predominant function of the PDZ domain proteins is to assemble signaling pathway components
into close proximity by recognition of the last four C-terminal amino acids of GPCRs. We present here a machine learning based
approach for the identification of GPCR-binding PDZ domain proteins. In order to characterize the network of interactions
between amino acid residues that contribute to the stability of the PDZ domain-ligand complex and to encode the complex into
a feature vector, amino acid contact matrices and physicochemical distance matrix were constructed and adopted. This novel
machine learning based method displayed high performance for the identification of PDZ domain-ligand interactions and allowed
the identification of novel GPCR-PDZ domain protein interactions. 相似文献
9.
Brown SP 《Current biology : CB》2006,16(22):R960-R961
Putting a competitive squeeze on a cooperative group has long been considered to encourage cheats. Now we learn that competition, by driving diversification among cooperators, can create groups that are both more productive and more resistant to defection. 相似文献
10.
Protein sequences have evolved to fold into functional structures, resulting in families of diverse protein sequences that all share the same overall fold. One can harness protein family sequence data to infer likely contacts between pairs of residues. In the current study, we combine this kind of inference from coevolutionary information with a coarse‐grained protein force field ordinarily used with single sequence input, the Associative memory, Water mediated, Structure and Energy Model (AWSEM), to achieve improved structure prediction. The resulting Associative memory, Water mediated, Structure and Energy Model with Evolutionary Restraints (AWSEM‐ER) yields a significant improvement in the quality of protein structure prediction over the single sequence prediction from AWSEM when a sufficiently large number of homologous sequences are available. Free energy landscape analysis shows that the addition of the evolutionary term shifts the free energy minimum to more native‐like structures, which explains the improvement in the quality of structures when performing predictions using simulated annealing. Simulations using AWSEM without coevolutionary information have proved useful in elucidating not only protein folding behavior, but also mechanisms of protein function. The success of AWSEM‐ER in de novo structure prediction suggests that the enhanced model opens the door to functional studies of proteins even when no experimentally solved structures are available. 相似文献
11.
12.
The delineation of domain boundaries of a given sequence in the absence of known 3D structures or detectable sequence homology to known domains benefits many areas in protein science, such as protein engineering, protein 3D structure determination and protein structure prediction. With the exponential growth of newly determined sequences, our ability to predict domain boundaries rapidly and accurately from sequence information alone is both essential and critical from the viewpoint of gene function annotation. Anyone attempting to predict domain boundaries for a single protein sequence is invariably confronted with a plethora of databases that contain boundary information available from the internet and a variety of methods for domain boundary prediction. How are these derived and how well do they work? What definition of 'domain' do they use? We will first clarify the different definitions of protein domains, and then describe the available public databases with domain boundary information. Finally, we will review existing domain boundary prediction methods and discuss their strengths and weaknesses. 相似文献
13.
14.
15.
Proteins are played key roles in different functionalities in our daily life. All functional roles of a protein are a bit enhanced in interaction compared to individuals. Identification of essential proteins of an organism is a time consume and costly task during observation in the wet lab. The results of observation in wet lab always ensure high reliability and accuracy in the biological ground. Essential protein prediction using computational approaches is an alternative choice in research. It proves its significance rapidly in day-to-day life as well as reduces the experimental cost of wet lab effectively. Existing computational methods were implemented using Protein interaction networks (PPIN), Sequence, Gene Expression Dataset (GED), Gene Ontology (GO), Orthologous groups, and Subcellular localized datasets. Machine learning has diverse categories of features that enable to model and predict essential macromolecules of understudied organisms. A novel methodology MEM-FET (membership feature) is predicted based on features, that is, edge clustering coefficient, Average clustering coefficient, subcellular localization, and Gene Ontology within a compartment of common neighbors. The accuracy (ACC) values of the predicted true positive (TP) essential proteins are 0.79, 0.74, 0.78, and 0.71 for YHQ, YMIPS, YDIP, and YMBD datasets. An enriched set of essential proteins are also predicted using the MEM-FET algorithm. Ensemble ML also validated the proposed model with an accuracy of 60%. It has been predicted that MEM-FET algorithms outperform other existing algorithms with an ACC value of 80% for the yeast dataset. 相似文献
16.
Protein domain boundary prediction is critical for understanding protein structure and function. In this study, we present a novel method, an order profile domain linker propensity index (OPI), which uses the evolutionary information extracted from the protein sequence frequency profiles calculated from the multiple sequence alignments. A protein sequence is first converted into smooth and normalized numeric order profiles by OPI, from which the domain linkers can be predicted. By discriminating the different frequencies of the amino acids in the protein sequence frequency profiles, OPI clearly shows better performance than our previous method, a binary profile domain linker propensity index (PDLI). We tested our new method on two different datasets, SCOP-1 dataset and SCOP-2 dataset, and we were able to achieve a precision of 0.82 and 0.91 respectively. OPI also outperforms other residue-level, profile-level indexes as well as other state-of-the-art methods. 相似文献
17.
Martin Sturm Michael Hackenberg David Langenberger Dmitrij Frishman 《BMC bioinformatics》2010,11(1):292
Background
Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. 相似文献18.
RNA编辑是一个十分重要的生物细胞分子机制。作为转录后修饰的一步,它可以增加蛋白质组学多样性,改变转录产物的稳定性,调节基因表达等。RNA编辑失调会导致各种疾病,包括神经疾病和癌症。在动物中,腺苷到肌苷(A-to-I)的编辑是最普遍的。高通量测序技术的进步大大提高了在全局范围内检测和量化RNA编辑的能力,使得RNA编辑的大规模全基因组分析变得可行,产生了一系列基于高通量测序技术的RNA编辑位点预测方法。通过对这些方法进行介绍、总结和分析,为RNA编辑的进一步研究提供一些思路。 相似文献
19.
20.
Computational prediction of protein complex structures through docking offers a means to gain a mechanistic understanding of protein interactions that mediate biological processes. This is particularly important as the number of experimentally determined structures of isolated proteins exceeds the number of structures of complexes. A comprehensive docking procedure is described in which efficient sampling of conformations is achieved by matching surface normal vectors, fast filtering for shape complementarity, clustering by RMSD, and scoring the docked conformations using a supervised machine learning approach. Contacting residue pair frequencies, residue propensities, evolutionary conservation, and shape complementarity score for each docking conformation are used as input data to a Random Forest classifier. The performance of the Random Forest approach for selecting correctly docked conformations was assessed by cross-validation using a nonredundant benchmark set of X-ray structures for 93 heterodimer and 733 homodimer complexes. The single highest rank docking solution was the correct (near-native) structure for slightly more than one third of the complexes. Furthermore, the fraction of highly ranked correct structures was significantly higher than the overall fraction of correct structures, for almost all complexes. A detailed analysis of the difficult to predict complexes revealed that the majority of the homodimer cases were explained by incorrect oligomeric state annotation. Evolutionary conservation and shape complementarity score as well as both underrepresented and overrepresented residue types and residue pairs were found to make the largest contributions to the overall prediction accuracy. Finally, the method was also applied to docking unbound subunit structures from a previously published benchmark set. 相似文献