首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
WebFEATURE (http://feature.stanford.edu/webfeature/) is a web-accessible structural analysis tool that allows users to scan query structures for functional sites in both proteins and nucleic acids. WebFEATURE is the public interface to the scanning algorithm of the FEATURE package, a supervised learning algorithm for creating and identifying 3D, physicochemical motifs in molecular structures. Given an input structure or Protein Data Bank identifier (PDB ID), and a statistical model of a functional site, WebFEATURE will return rank-scored 'hits' in 3D space that identify regions in the structure where similar distributions of physicochemical properties occur relative to the site model. Users can visualize and interactively manipulate scored hits and the query structure in web browsers that support the Chime plug-in. Alternatively, results can be downloaded and visualized through other freely available molecular modeling tools, like RasMol, PyMOL and Chimera. A major application of WebFEATURE is in rapid annotation of function to structures in the context of structural genomics.  相似文献   

2.
The increase in known three-dimensional protein structures enables us to build statistical profiles of important functional sites in protein molecules. These profiles can then be used to recognize sites in large-scale automated annotations of new protein structures. We report an improved FEATURE system which recognizes functional sites in protein structures. FEATURE defines multi-level physico-chemical properties and recognizes sites based on the spatial distribution of these properties in the sites' microenvironments. It uses a Bayesian scoring function to compare a query region with the statistical profile built from known examples of sites and control nonsites. We have previously shown that FEATURE can accurately recognize calcium-binding sites and have reported interesting results scanning for calcium-binding sites in the entire Protein Data Bank. Here we report the ability of the improved FEATURE to characterize and recognize geometrically complex and asymmetric sites such as ATP-binding sites and disulfide bond-forming sites. FEATURE does not rely on conserved residues or conserved residue geometry of the sites. We also demonstrate that, in the absence of a statistical profile of the sites, FEATURE can use an artificially constructed profile based on a priori knowledge to recognize the sites in new structures, using redoxin active sites as an example.  相似文献   

3.
The shikimate pathway is an attractive target for herbicides and antimicrobial agent development because it is essential in algae, higher plants, bacteria, and fungi, but absent from mammals. Homologues to enzymes in the shikimate pathway have been identified in the genome sequence of Mycobacterium tuberculosis. Among them, the EPSP synthase was proposed to be present by sequence homology. Accordingly, in order to pave the way for structural and functional efforts towards anti-mycobacterial agent development, here we describe the molecular modeling of 5-enolpyruvylshikimate-3-phosphate (EPSP) synthase isolated from M. tuberculosis that should provide a structural framework on which the design of specific inhibitors may be based on. Significant differences in the relative orientation of the domains in the two models result in "open" and "closed" conformations. The possible relevance of this structural transition in the ligand biding is discussed.  相似文献   

4.
Tuberculosis (TB) resurged in the late 1980s and now kills approximately 3 million people a year. The reemergence of tuberculosis as a public health threat has created a need to develop new anti-mycobacterial agents. The shikimate pathway is an attractive target for herbicides and anti-microbial agents development because it is essential in algae, higher plants, bacteria, and fungi, but absent from mammals. Homologs to enzymes in the shikimate pathway have been identified in the genome sequence of Mycobacterium tuberculosis. Among them, the shikimate kinase I encoding gene (aroK) was proposed to be present by sequence homology. Accordingly, to pave the way for structural and functional efforts towards anti-mycobacterial agents development, here we describe the molecular modeling of M. tuberculosis shikimate kinase that should provide a structural framework on which the design of specific inhibitors may be based.  相似文献   

5.
An important goal of structural genomics is to complete the structural analysis of all the enzymes in metabolic pathways and to understand the structural similarities and differences. A preliminary glimpse of this type of analysis was achieved before structural genomics efforts with the glycolytic pathway and efforts are underway for many other pathways, including that of catecholamine metabolism. Structural enzymology necessitates a complete structural characterization, even for highly homologous proteins (greater than 80% sequence homology), as every active site has distinct structural features and it is these active site differences that distinguish one enzyme from another. Short cuts with homology modeling cannot be taken with our current knowledge base. Each enzyme structure in a pathway needs to be determined, including structures containing bound substrates, cofactors, products and transition state analogs, in order to obtain a complete structural and functional understanding of pathway-related enzymes.  相似文献   

6.
Building structural models of entire cells has been a long-standing cross-discipline challenge for the research community, as it requires an unprecedented level of integration between multiple sources of biological data and enhanced methods for computational modeling and visualization. Here, we present the first 3D structural models of an entire Mycoplasma genitalium (MG) cell, built using the CellPACK suite of computational modeling tools. Our model recapitulates the data described in recent whole-cell system biology simulations and provides a structural representation for all MG proteins, DNA and RNA molecules, obtained by combining experimental and homology-modeled structures and lattice-based models of the genome. We establish a framework for gathering, curating and evaluating these structures, exposing current weaknesses of modeling methods and the boundaries of MG structural knowledge, and visualization methods to explore functional characteristics of the genome and proteome. We compare two approaches for data gathering, a manually-curated workflow and an automated workflow that uses homologous structures, both of which are appropriate for the analysis of mesoscale properties such as crowding and volume occupancy. Analysis of model quality provides estimates of the regularization that will be required when these models are used as starting points for atomic molecular dynamics simulations.  相似文献   

7.
Kosloff M  Kolodny R 《Proteins》2008,71(2):891-902
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).  相似文献   

8.
Although multiple sequence alignments (MSAs) are essential for a wide range of applications from structure modeling to prediction of functional sites, construction of accurate MSAs for distantly related proteins remains a largely unsolved problem. The rapidly increasing database of spatial structures is a valuable source to improve alignment quality. We explore the use of 3D structural information to guide sequence alignments constructed by our MSA program PROMALS. The resulting tool, PROMALS3D, automatically identifies homologs with known 3D structures for the input sequences, derives structural constraints through structure-based alignments and combines them with sequence constraints to construct consistency-based multiple sequence alignments. The output is a consensus alignment that brings together sequence and structural information about input proteins and their homologs. PROMALS3D can also align sequences of multiple input structures, with the output representing a multiple structure-based alignment refined in combination with sequence constraints. The advantage of PROMALS3D is that it gives researchers an easy way to produce high-quality alignments consistent with both sequences and structures of proteins. PROMALS3D outperforms a number of existing methods for constructing multiple sequence or structural alignments using both reference-dependent and reference-independent evaluation methods.  相似文献   

9.
Comparative analysis of structure and function of macromolecules, such as proteins, is an integral part of modern evolutionary biology. The first and critical step in understanding evolution of homologous proteins is their amino acid sequence alignment. However, standard algorithms fail to provide unambiguous sequence alignment for proteins of poor homology. More reliable results can be provided by comparing experimental 3D structures obtained at atomic resolution with the aid of X-ray structural analysis. If such structures are lacking, homology modeling is used which considers indirect experimental data on functional roles of individual amino acid residues. An important problem is that sequence alignment, which reflects genetic modifications, not necessarily corresponds to functional homology, which depends on 3D structures critical for natural selection. Since the alignment techniques relying only on the analysis of primary structures carry no information on the functional properties of proteins, the inclusion of 3D structures into consideration is of utmost importance. Here we consider several ion channels as examples to demonstrate that alignment of their 3D structures can significantly improve sequence alignment obtained by traditional methods.  相似文献   

10.
Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E: value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation.  相似文献   

11.
The thioredoxin family of oxidoreductases plays an important role in redox signaling and control of protein function. Not only are thioredoxins linked to a variety of disorders, but their stable structure has also seen application in protein engineering. Both sequence-based and structure-based tools exist for thioredoxin identification, but remote homolog detection remains a challenge. We developed a thioredoxin predictor using the approach of integrating sequence with structural information. We combined a sequence-based Hidden Markov Model (HMM) with a molecular dynamics enhanced structure-based recognition method (dynamic FEATURE, DF). This hybrid method (HMMDF) has high precision and recall (0.90 and 0.95, respectively) compared with HMM (0.92 and 0.87, respectively) and DF (0.82 and 0.97, respectively). Dynamic FEATURE is sensitive but struggles to resolve closely related protein families, while HMM identifies these evolutionary differences by compromising sensitivity. Our method applied to structural genomics targets makes?a strong prediction of a novel thioredoxin.  相似文献   

12.
Typically, protein spatial structures are more conserved in evolution than amino acid sequences. However, the recent explosion of sequence and structure information accompanied by the development of powerful computational methods led to the accumulation of examples of homologous proteins with globally distinct structures. Significant sequence conservation, local structural resemblance, and functional similarity strongly indicate evolutionary relationships between these proteins despite pronounced structural differences at the fold level. Several mechanisms such as insertions/deletions/substitutions, circular permutations, and rearrangements in beta-sheet topologies account for the majority of detected structural irregularities. The existence of evolutionarily related proteins that possess different folds brings new challenges to the homology modeling techniques and the structure classification strategies and offers new opportunities for protein design in experimental studies.  相似文献   

13.
The Wiggle series are support vector machine–based predictors that identify regions of functional flexibility using only protein sequence information. Functionally flexible regions are defined as regions that can adopt different conformational states and are assumed to be necessary for bioactivity. Many advances have been made in understanding the relationship between protein sequence and structure. This work contributes to those efforts by making strides to understand the relationship between protein sequence and flexibility. A coarse-grained protein dynamic modeling approach was used to generate the dataset required for support vector machine training. We define our regions of interest based on the participation of residues in correlated large-scale fluctuations. Even with this structure-based approach to computationally define regions of functional flexibility, predictors successfully extract sequence-flexibility relationships that have been experimentally confirmed to be functionally important. Thus, a sequence-based tool to identify flexible regions important for protein function has been created. The ability to identify functional flexibility using a sequence based approach complements structure-based definitions and will be especially useful for the large majority of proteins with unknown structures. The methodology offers promise to identify structural genomics targets amenable to crystallization and the possibility to engineer more flexible or rigid regions within proteins to modify their bioactivity.  相似文献   

14.
Rapid advances in structural biology have revealed the three-dimensional structures of many biocatalysts. Molecular modeling is the tool that links these structures with experimental observations. As a qualitative tool, current modeling methods are extremely useful. They can explain, on a molecular level, unusual features of reactions. They can predict how to increase the selectivity either by substrate modification or by site-directed mutagenesis. Quantitative predictions, for example the degree of enantioselectivity, are still not reliable, however. Modeling is limited also by the availability of three-dimensional structures. Most current modeling involves hydrolases, especially proteases and lipases, but structures for other types of enzymes are starting to appear.  相似文献   

15.
Mammalian immunoglobulin VH families can be grouped into three distinct clans based upon sequence conservation in two of the three framework (FR) intervals. Through replacement/silent site substitution analysis, molecular modeling and mathematical evaluation of known immunoglobulin crystal structures, we demonstrate that this conservation reflects preservation of protein sequence and structure. Each clan contains a characteristic FR 1 interval that is solvent-exposed and structurally separated from the antigen binding site. Families within a clan contain their own unique FR 3 interval that is capable of either influencing the conformation of the antigen binding site or interacting directly with antigen. Our results provide a structural context for theories that address differential use of VH families in the immune response.  相似文献   

16.
The Wiggle series are support vector machine-based predictors that identify regions of functional flexibility using only protein sequence information. Functionally flexible regions are defined as regions that can adopt different conformational states and are assumed to be necessary for bioactivity. Many advances have been made in understanding the relationship between protein sequence and structure. This work contributes to those efforts by making strides to understand the relationship between protein sequence and flexibility. A coarse-grained protein dynamic modeling approach was used to generate the dataset required for support vector machine training. We define our regions of interest based on the participation of residues in correlated large-scale fluctuations. Even with this structure-based approach to computationally define regions of functional flexibility, predictors successfully extract sequence-flexibility relationships that have been experimentally confirmed to be functionally important. Thus, a sequence-based tool to identify flexible regions important for protein function has been created. The ability to identify functional flexibility using a sequence based approach complements structure-based definitions and will be especially useful for the large majority of proteins with unknown structures. The methodology offers promise to identify structural genomics targets amenable to crystallization and the possibility to engineer more flexible or rigid regions within proteins to modify their bioactivity.  相似文献   

17.
The dramatically increasing number of new protein sequences arising from genomics 4 proteomics requires the need for methods to rapidly and reliably infer the molecular and cellular functions of these proteins. One such approach, structural genomics, aims to delineate the total repertoire of protein folds in nature, thereby providing three-dimensional folding patterns for all proteins and to infer molecular functions of the proteins based on the combined information of structures and sequences. The goal of obtaining protein structures on a genomic scale has motivated the development of high throughput technologies and protocols for macromolecular structure determination that have begun to produce structures at a greater rate than previously possible. These new structures have revealed many unexpected functional inferences and evolutionary relationships that were hidden at the sequence level. Here, we present samples of structures determined at Berkeley Structural Genomics Center and collaborators laboratories to illustrate how structural information provides and complements sequence information to deduce the functional inferences of proteins with unknown molecular functions.Two of the major premises of structural genomics are to discover a complete repertoire of protein folds in nature and to find molecular functions of the proteins whose functions are not predicted from sequence comparison alone. To achieve these objectives on a genomic scale, new methods, protocols, and technologies need to be developed by multi-institutional collaborations worldwide. As part of this effort, the Protein Structure Initiative has been launched in the United States (PSI; www.nigms.nih.gov/funding/psi.html). Although infrastructure building and technology development are still the main focus of structural genomics programs [1–6], a considerable number of protein structures have already been produced, some of them coming directly out of semi-automated structure determination pipelines [6–10]. The Berkeley Structural Genomics Center (BSGC) has focused on the proteins of Mycoplasma or their homologues from other organisms as its structural genomics targets because of the minimal genome size of the Mycoplasmas as well as their relevance to human and animal pathogenicity (http://www.strgen.org). Here we present several protein examples encompassing a spectrum of functional inferences obtainable from their three-dimensional structures in five situations, where the inferences are new and testable, and are not predictable from protein sequence information alone.  相似文献   

18.
Bacterial pathogens have co-evolved with their hosts in their ongoing quest for advantage in the resulting interaction. These intimate associations have resulted in remarkable adaptations of prokaryotic virulence proteins and their eukaryotic molecular targets. An important strategy used by microbial pathogens of animals to manipulate host cellular functions is structural mimicry of eukaryotic proteins. Recent evidence demonstrates that plant pathogens also use structural mimicry of host factors as a virulence strategy. Nearly all virulence proteins from phytopathogenic bacteria have eluded functional annotation on the basis of primary amino-acid sequence. Recent efforts to determine their three-dimensional structures are, however, revealing important clues about the mechanisms of bacterial virulence in plants.  相似文献   

19.
Structural genomics (or proteomics) activities are critically dependent on the availability of high-throughput structure determination methodology. Development of such methodology has been a particular challenge for NMR based structure determination because of the demands for isotopic labeling of proteins and the requirements for very long data acquisition times. We present here a methodology that gains efficiency from a focus on determination of backbone structures of proteins as opposed to full structures with all sidechains in place. This focus is appropriate given the presumption that many protein structures in the future will be built using computational methods that start from representative fold family structures and replace as many as 70% of the sidechains in the course of structure determination. The methodology we present is based primarily on residual dipolar couplings (RDCs), readily accessible NMR observables that constrain the orientation of backbone fragments irrespective of separation in space. A new software tool is described for the assembly of backbone fragments under RDC constraints and an application to a structural genomics target is presented. The target is an 8.7 kDa protein from Pyrococcus furiosus, PF1061, that was previously not well annotated, and had a nearest structurally characterized neighbor with only 33% sequence identity. The structure produced shows structural similarity to this sequence homologue, but also shows similarity to other proteins, which suggests a functional role in sulfur transfer. Given the backbone structure and a possible functional link this should be an ideal target for development of modeling methods.  相似文献   

20.
Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号