首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Proteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. These evolutionary pressures are sufficiently consistent over time and across protein families to produce substitution patterns, summarized in global amino acid substitution matrices such as BLOSUM, JTT, WAG, and LG, which can be used to successfully detect homologs, infer phylogenies, and reconstruct ancestral sequences. Although the factors that govern the variation of amino acid substitution rates have received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid substitution matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi‐nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex yet universal pattern observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary driver behind the global amino acid substitution patterns observed in proteins throughout the tree of life.  相似文献   

3.
Coarse‐grained models for protein structure are increasingly used in simulations and structural bioinformatics. In this study, we evaluated the effectiveness of three granularities of protein representation based on their ability to discriminate between correctly folded native structures and incorrectly folded decoy structures. The three levels of representation used one bead per amino acid (coarse), two beads per amino acid (medium), and all atoms (fine). Multiple structure features were compared at each representation level including two‐body interactions, three‐body interactions, solvent exposure, contact numbers, and angle bending. In most cases, the all‐atom level was most successful at discriminating decoys, but the two‐bead level provided a good compromise between the number of model parameters which must be estimated and the accuracy achieved. The most effective feature type appeared to be two‐body interactions. Considering three‐body interactions increased accuracy only marginally when all atoms were used and not at all in medium and coarse representations. Though two‐body interactions were most effective for the coarse representations, the accuracy loss for using only solvent exposure or contact number was proportionally less at these levels than in the all‐atom representation. We propose an optimization method capable of selecting bead types of different granularities to create a mixed representation of the protein. We illustrate its behavior on decoy discrimination and discuss implications for data‐driven protein model selection. Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

4.
The development of a kinase structural database, the kinase knowledge base (KKB), is described. It covers all human kinase domain structures that have been deposited in the Protein Data Bank. All structures are renumbered using a common scheme, which enables efficient cross‐comparisons and multiple queries of interest to the kinase field. The common numbering scheme is also used to automatically annotate conserved residues and motifs, and conformationally classify the structures based on the DFG‐loop and Helix C. Analyses of residue conservation in the ATP binding site using the full human‐kinome–sequence alignment lead to the identification of a conserved hydrogen bond between the hinge region backbone and a glycine in the specificity surface. Furthermore, 90% of kinases are found to have at least one stabilizing interaction for the hinge region, which has not been described before.  相似文献   

5.
We have revisited the protein coarse-grained optimized potential for efficient structure prediction (OPEP). The training and validation sets consist of 13 and 16 protein targets. Because optimization depends on details of how the ensemble of decoys is sampled, trial conformations are generated by molecular dynamics, threading, greedy, and Monte Carlo simulations, or taken from publicly available databases. The OPEP parameters are varied by a genetic algorithm using a scoring function which requires that the native structure has the lowest energy, and the native-like structures have energy higher than the native structure but lower than the remote conformations. Overall, we find that OPEP correctly identifies 24 native or native-like states for 29 targets and has very similar capability to the all-atom discrete optimized protein energy model (DOPE), found recently to outperform five currently used energy models.  相似文献   

6.
The architecture and weights of an artificial neural network model that predicts putative transmembrane sequences have been developed and optimized by the algorithm of structure evolution. The resulting filter is able to classify membrane/nonmembrane transition regions in sequences of integral human membrane proteins with high accuracy. Similar results have been obtained for both training and test set data, indicating that the network has focused on general features of transmembrane sequences rather than specializing on the training data. Seven physicochemical amino acid properties have been used for sequence encoding. The predictions are compared to hydrophobicity plots.  相似文献   

7.
8.
A statistical approach was applied to select those models that best fit each individual mitochondrial (mt) protein at different taxonomic levels of metazoans. The existing mitochondrial replacement matrices, MtREV and MtMam, were found to be the best-fit models for the mt-proteins of vertebrates, with the exception of Nd6, at different taxonomic levels. Remarkably, existing mitochondrial matrices generally failed to best-fit invertebrate mt-proteins. In an attempt to better model the evolution of invertebrate mt-proteins, a new replacement matrix, named MtArt, was constructed based on arthropod mt-proteomes. The new model was found to best fit almost all analyzed invertebrate mt-protein data sets. The observed pattern of model fit across the different data sets indicates that no single replacement matrix is able to describe the general evolutionary properties of mt-proteins but rather that taxonomical biases and/or the existence of different mt-genetic codes have great influence on which model is selected.  相似文献   

9.
We introduce a side‐chain‐inclusive scoring function, named OPUS‐SSF, for ranking protein structural models. The method builds a scoring function based on the native distributions of the coordinate components of certain anchoring points in a local molecular system for peptide segments of 5, 7, 9, and 11 residues in length. Differing from our previous OPUS‐CSF [Xu et al., Protein Sci. 2018; 27: 286–292], which exclusively uses main chain information, OPUS‐SSF employs anchoring points on side chains so that the effect of side chains is taken into account. The performance of OPUS‐SSF was tested on 15 decoy sets containing totally 603 proteins, and 571 of them had their native structures recognized from their decoys. Similar to OPUS‐CSF, OPUS‐SSF does not employ the Boltzmann formula in constructing scoring functions. The results indicate that OPUS‐SSF has achieved a significant improvement on decoy recognition and it should be a very useful tool for protein structural prediction and modeling.  相似文献   

10.
In multi‐domain proteins, the domains typically run end‐to‐end, that is, one domain follows the C‐terminus of another domain. However, approximately 10% of multi‐domain proteins are formed by insertion of one domain sequence into that of another domain. Detecting such insertions within protein sequences is a fundamental challenge in structural biology. The haloacid dehalogenase superfamily (HADSF) serves as a challenging model system wherein a variable cap domain (~5–200 residues in length) accessorizes the ubiquitous Rossmann‐fold core domain, with variations in insertion site and topology corresponding to different classes of cap types. Herein, we describe a comprehensive computational strategy, CapPredictor, for determining large, variable domain insertions in protein sequences. Using a novel sequence‐alignment algorithm in conjunction with a structure‐guided sequence profile from 154 core‐domain‐only structures, more than 40,000 HADSF member sequences were assigned cap types. The resulting data set afforded insight into HADSF evolution. Notably, a similar distribution of cap‐type classes across different phyla was observed, indicating that all cap types existed in the last universal common ancestor. In addition, comparative analyses of the predicted cap‐type and functional assignments showed that different cap types carry out similar chemistries. Thus, while cap domains play a role in substrate recognition and chemical reactivity, cap‐type does not strictly define functional class. Through this example, we have shown that CapPredictor is an effective new tool for the study of form and function in protein families where domain insertion occurs. Proteins 2014; 82:1896–1906. © 2014 Wiley Periodicals, Inc.  相似文献   

11.
12.
Evaluation of protein models against the native structure is essential for the development and benchmarking of protein structure prediction methods. Although a number of evaluation scores have been proposed to date, many aspects of model assessment still lack desired robustness. In this study we present CAD‐score, a new evaluation function quantifying differences between physical contacts in a model and the reference structure. The new score uses the concept of residue–residue contact area difference (CAD) introduced by Abagyan and Totrov (J Mol Biol 1997; 268:678–685). Contact areas, the underlying basis of the score, are derived using the Voronoi tessellation of protein structure. The newly introduced CAD‐score is a continuous function, confined within fixed limits, free of any arbitrary thresholds or parameters. The built‐in logic for treatment of missing residues allows consistent ranking of models of any degree of completeness. We tested CAD‐score on a large set of diverse models and compared it to GDT‐TS, a widely accepted measure of model accuracy. Similarly to GDT‐TS, CAD‐score showed a robust performance on single‐domain proteins, but displayed a stronger preference for physically more realistic models. Unlike GDT‐TS, the new score revealed a balanced assessment of domain rearrangement, removing the necessity for different treatment of single‐domain, multi‐domain, and multi‐subunit structures. Moreover, CAD‐score makes it possible to assess the accuracy of inter‐domain or inter‐subunit interfaces directly. In addition, the approach offers an alternative to the superposition‐based model clustering. The CAD‐score implementation is available both as a web server and a standalone software package at http://www.ibt.lt/bioinformatics/cad‐score/ . Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

13.
Recognition of short linear motifs (SLiMs) or peptides by proteins is an important component of many cellular processes. However, due to limited and degenerate binding motifs, prediction of cellular targets is challenging. In addition, many of these interactions are transient and of relatively low affinity. Here, we focus on one of the largest families of SLiM‐binding domains in the human proteome, the PDZ domain. These domains bind the extreme C‐terminus of target proteins, and are involved in many signaling and trafficking pathways. To predict endogenous targets of PDZ domains, we developed MotifAnalyzer‐PDZ, a program that filters and compares all motif‐satisfying sequences in any publicly available proteome. This approach enables us to determine possible PDZ binding targets in humans and other organisms. Using this program, we predicted and biochemically tested novel human PDZ targets by looking for strong sequence conservation in evolution. We also identified three C‐terminal sequences in choanoflagellates that bind a choanoflagellate PDZ domain, the Monsiga brevicollis SHANK1 PDZ domain (mbSHANK1), with endogenously‐relevant affinities, despite a lack of conservation with the targets of a homologous human PDZ domain, SHANK1. All three are predicted to be signaling proteins, with strong sequence homology to cytosolic and receptor tyrosine kinases. Finally, we analyzed and compared the positional amino acid enrichments in PDZ motif‐satisfying sequences from over a dozen organisms. Overall, MotifAnalyzer‐PDZ is a versatile program to investigate potential PDZ interactions. This proof‐of‐concept work is poised to enable similar types of analyses for other SLiM‐binding domains (e.g., MotifAnalyzer‐Kinase). MotifAnalyzer‐PDZ is available at http://motifAnalyzerPDZ.cs.wwu.edu .  相似文献   

14.
The conformational properties of unbound multi‐Cys2His2 (mC2H2) zinc finger proteins, in which zinc finger domains are connected by flexible linkers, are studied by a multiscale approach. Three methods on different length scales are utilized. First, atomic detail molecular dynamics simulations of one zinc finger and its adjacent flexible linker confirmed that the zinc finger is more rigid than the flexible linker. Second, the end‐to‐end distance distributions of mC2H2 zinc finger proteins are computed using an efficient atomistic pivoting algorithm, which only takes excluded volume interactions into consideration. The end‐to‐end distance distribution gradually changes its profile, from left‐tailed to right‐tailed, as the number of zinc fingers increases. This is explained by using a worm‐like chain model. For proteins of a few zinc fingers, an effective bending constraint favors an extended conformation. Only for proteins containing more than nine zinc fingers, is a somewhat compacted conformation preferred. Third, a mesoscale model is modified to study both the local and the global conformational properties of multi‐C2H2 zinc finger proteins. Simulations of the CCCTC‐binding factor (CTCF), an important mC2H2 zinc finger protein for genome spatial organization, are presented. Proteins 2015; 83:1604–1615. © 2015 Wiley Periodicals, Inc.  相似文献   

15.
Hyun Joo  Jerry Tsai 《Proteins》2014,82(9):2128-2140
To understand the relationship between protein sequence and structure, this work extends the knob‐socket model in an investigation of β‐sheet packing. Over a comprehensive set of β‐sheet folds, the contacts between residues were used to identify packing cliques: sets of residues that all contact each other. These packing cliques were then classified based on size and contact order. From this analysis, the two types of four‐residue packing cliques necessary to describe β‐sheet packing were characterized. Both occur between two adjacent hydrogen bonded β‐strands. First, defining the secondary structure packing within β‐sheets, the combined socket or XY:HG pocket consists of four residues i, i+2 on one strand and j, j+2 on the other. Second, characterizing the tertiary packing between β‐sheets, the knob‐socket XY:H+B consists of a three‐residue XY:H socket (i, i+2 on one strand and j on the other) packed against a knob B residue (residue k distant in sequence). Depending on the packing depth of the knob B residue, two types of knob‐sockets are found: side‐chain and main‐chain sockets. The amino acid composition of the pockets and knob‐sockets reveal the sequence specificity of β‐sheet packing. For β‐sheet formation, the XY:HG pocket clearly shows sequence specificity of amino acids. For tertiary packing, the XY:H+B side‐chain and main‐chain sockets exhibit distinct amino acid preferences at each position. These relationships define an amino acid code for β‐sheet structure and provide an intuitive topological mapping of β‐sheet packing. Proteins 2014; 82:2128–2140. © 2014 Wiley Periodicals, Inc.  相似文献   

16.
We present a Model Quality Assessment Program (MQAP), called MQAPsingle, for ranking and assessing the absolute global quality of single protein models. MQAPsingle is quasi single‐model MQAP, a method that combines advantages of both “pure” single‐model MQAPs and clustering MQAPs. This approach results in higher accuracy compared to the state‐of‐the‐art single‐model MQAPs. Notably, the prediction for a given model is the same regardless if this model is submitted to our server alone or together with other models. Proteins 2016; 84:1021–1028. © 2015 Wiley Periodicals, Inc.  相似文献   

17.
The extracellular isoform of superoxide dismutase (EC‐SOD, Sod3) plays a protective role against various diseases and injuries mediated by oxidative stress. To investigate the pathophysiological roles of EC‐SOD, we generated tetracycline‐inducible Sod3 transgenic mice and directed the tissue‐specific expression of transgenes by crossing Sod3 transgenic mice with tissue‐specific transactivator transgenics. Double transgenic mice with liver‐specific expression of Sod3 showed increased EC‐SOD levels predominantly in the plasma as the circulating form, whereas double transgenic mice with neuronal‐specific expression expressed higher levels of EC‐SOD in hippocampus and cortex with intact EC‐SOD as the dominant form. EC‐SOD protein levels also correlated well with increased SOD activities in double transgenic mice. In addition to enabling tissue‐specific expression, the transgene expression can be quickly turned on and off by doxycycline supplementation in the mouse chow. This mouse model, thus, provides the flexibility for on–off control of transgene expression in multiple target tissues. genesis 47:142–154, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

18.
N‐Acetylneuraminic acid (NANA) is the most common naturally occurring sialic acid and plays a key role in the pathogenesis of a select number of neuroinvasive bacteria such as Neisseria meningitidis. NANA is synthesized in prokaryotes via a condensation reaction between phosphoenolpyruvate and N‐acetylmannosamine. This reaction is catalyzed by a domain swapped, homodimeric enzyme, N‐acetylneuraminic acid synthase (NANAS). NANAS comprises two distinct domains; an N‐terminal catalytic (β/α)8 barrel linked to a C‐terminal antifreeze protein‐like (AFPL) domain. We have investigated the role of the AFPL domain by characterizing a truncated variant of NmeNANAS, which was discovered to be soluble yet inactive. Analytical ultracentrifugation and analytical size exclusion were used to probe the quaternary state of the NmeNANAS truncation, and revealed that loss of the AFPL domain destabilizes the dimeric form of the enzyme. The results from this study thereby demonstrate that the AFPL domain plays a critical role for both the catalytic function and quaternary structure stability of NANAS. Small angle X‐ray scattering, molecular dynamics simulations, and amino acid substitutions expose a complex hydrogen‐bonding relay, which links the roles of the catalytic and AFPL domains across subunit boundaries. Proteins 2014; 82:2054–2066. © 2014 Wiley Periodicals, Inc.  相似文献   

19.
PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence‐structure‐dynamics‐function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence‐conserved residues and build phylogenetic tree. Three‐dimensional structure alignment was also applied to obtain structure‐conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics.  相似文献   

20.
Conservation of biological communities requires accurate estimates of abundance for multiple species. Recent advances in estimating abundance of multiple species, such as Bayesian multispecies N‐mixture models, account for multiple sources of variation, including detection error. However, false‐positive errors (misidentification or double counts), which are prevalent in multispecies data sets, remain largely unaddressed. The dependent‐double observer (DDO) method is an emerging method that both accounts for detection error and is suggested to reduce the occurrence of false positives because it relies on two observers working collaboratively to identify individuals. To date, the DDO method has not been combined with advantages of multispecies N‐mixture models. Here, we derive an extension of a multispecies N‐mixture model using the DDO survey method to create a multispecies dependent double‐observer abundance model (MDAM). The MDAM uses a hierarchical framework to account for biological and observational processes in a statistically consistent framework while using the accurate observation data from the DDO survey method. We demonstrate that the MDAM accurately estimates abundance of multiple species with simulated and real multispecies data sets. Simulations showed that the model provides both precise and accurate abundance estimates, with average credible interval coverage across 100 repeated simulations of 94.5% for abundance estimates and 92.5% for detection estimates. In addition, 92.2% of abundance estimates had a mean absolute percent error between 0% and 20%, with a mean of 7.7%. We present the MDAM as an important step forward in expanding the applicability of the DDO method to a multispecies setting. Previous implementation of the DDO method suggests the MDAM can be applied to a broad array of biological communities. We suggest that researchers interested in assessing biological communities consider the MDAM as a tool for deriving accurate, multispecies abundance estimates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号