首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The dramatically increasing number of new protein sequences arising from genomics 4 proteomics requires the need for methods to rapidly and reliably infer the molecular and cellular functions of these proteins. One such approach, structural genomics, aims to delineate the total repertoire of protein folds in nature, thereby providing three-dimensional folding patterns for all proteins and to infer molecular functions of the proteins based on the combined information of structures and sequences. The goal of obtaining protein structures on a genomic scale has motivated the development of high throughput technologies and protocols for macromolecular structure determination that have begun to produce structures at a greater rate than previously possible. These new structures have revealed many unexpected functional inferences and evolutionary relationships that were hidden at the sequence level. Here, we present samples of structures determined at Berkeley Structural Genomics Center and collaborators laboratories to illustrate how structural information provides and complements sequence information to deduce the functional inferences of proteins with unknown molecular functions.Two of the major premises of structural genomics are to discover a complete repertoire of protein folds in nature and to find molecular functions of the proteins whose functions are not predicted from sequence comparison alone. To achieve these objectives on a genomic scale, new methods, protocols, and technologies need to be developed by multi-institutional collaborations worldwide. As part of this effort, the Protein Structure Initiative has been launched in the United States (PSI; www.nigms.nih.gov/funding/psi.html). Although infrastructure building and technology development are still the main focus of structural genomics programs [1–6], a considerable number of protein structures have already been produced, some of them coming directly out of semi-automated structure determination pipelines [6–10]. The Berkeley Structural Genomics Center (BSGC) has focused on the proteins of Mycoplasma or their homologues from other organisms as its structural genomics targets because of the minimal genome size of the Mycoplasmas as well as their relevance to human and animal pathogenicity (http://www.strgen.org). Here we present several protein examples encompassing a spectrum of functional inferences obtainable from their three-dimensional structures in five situations, where the inferences are new and testable, and are not predictable from protein sequence information alone.  相似文献   

2.
The problem of rational target selection for protein structure determination in structural genomics projects on microbes is addressed. A flexible computational procedure is described that directly incorporates the whole body of annotation available in the PEDANT genome database into the sequence clustering and selection process in order to identify proteins that are likely to possess currently unknown structural domains. Filtering out gene products based on predicted structural features, such as known three-dimensional structures and transmembrane regions, allows one to reduce the complexity of neighbor relationships between sequences and all but eliminates the need for further partitioning of single-linkage clusters into disjoint protein groups corresponding to homologous families. The results of a large-scale computation experiment in which exemplary target selection for 32 prokaryotic genomes was conducted are presented.  相似文献   

3.
Structure-based prediction of DNA target sites by regulatory proteins   总被引:15,自引:0,他引:15  
Kono H  Sarai A 《Proteins》1999,35(1):114-131
Regulatory proteins play a critical role in controlling complex spatial and temporal patterns of gene expression in higher organism, by recognizing multiple DNA sequences and regulating multiple target genes. Increasing amounts of structural data on the protein-DNA complex provides clues for the mechanism of target recognition by regulatory proteins. The analyses of the propensities of base-amino acid interactions observed in those structural data show that there is no one-to-one correspondence in the interaction, but clear preferences exist. On the other hand, the analysis of spatial distribution of amino acids around bases shows that even those amino acids with strong base preference such as Arg with G are distributed in a wide space around bases. Thus, amino acids with many different geometries can form a similar type of interaction with bases. The redundancy and structural flexibility in the interaction suggest that there are no simple rules in the sequence recognition, and its prediction is not straightforward. However, the spatial distributions of amino acids around bases indicate a possibility that the structural data can be used to derive empirical interaction potentials between amino acids and bases. Such information extracted from structural databases has been successfully used to predict amino acid sequences that fold into particular protein structures. We surmised that the structures of protein-DNA complexes could be used to predict DNA target sites for regulatory proteins, because determining DNA sequences that bind to a particular protein structure should be similar to finding amino acid sequences that fold into a particular structure. Here we demonstrate that the structural data can be used to predict DNA target sequences for regulatory proteins. Pairwise potentials that determine the interaction between bases and amino acids were empirically derived from the structural data. These potentials were then used to examine the compatibility between DNA sequences and the protein-DNA complex structure in a combinatorial "threading" procedure. We applied this strategy to the structures of protein-DNA complexes to predict DNA binding sites recognized by regulatory proteins. To test the applicability of this method in target-site prediction, we examined the effects of cognate and noncognate binding, cooperative binding, and DNA deformation on the binding specificity, and predicted binding sites in real promoters and compared with experimental data. These results show that target binding sites for several regulatory proteins are successfully predicted, and our data suggest that this method can serve as a powerful tool for predicting multiple target sites and target genes for regulatory proteins.  相似文献   

4.
Assigning function to structures is an important aspect of structural genomics projects, since they frequently provide structures for uncharacterized proteins. Similarities uncovered by structure alignment can suggest a similar function, even in the absence of sequence similarity. For proteins adopting novel folds or those with many functions, this strategy can fail, but functional clues can still come from comparison of local functional sites involving a few key residues. Here we assess the general applicability of functional site comparison through the study of 157 proteins solved by structural genomics initiatives. For 17, the method bolsters confidence in predictions made based on overall fold similarity. For another 12 with new folds, it suggests functions, including a putative phosphotyrosine binding site in the Archaeal protein Mth1187 and an active site for a ribose isomerase. The approach is applied weekly to all new structures, providing a resource for those interested in using structure to infer function.  相似文献   

5.
Protein binding sites are the places where molecular interactions occur. Thus, the analysis of protein binding sites is of crucial importance to understand the biological processes proteins are involved in. Herein, we focus on the computational analysis of protein binding sites and present structure-based methods that enable function prediction for orphan proteins and prediction of target druggability. We present the general ideas behind these methods, with a special emphasis on the scopes and limitations of these methods and their validation. Additionally, we present some successful applications of computational binding site analysis to emphasize the practical importance of these methods for biotechnology/bioeconomy and drug discovery.  相似文献   

6.
The discovery by structure-based and combinatorial methods of new RNA-binding drugs presents great opportunities for pharmacological development against drug-resistant bacterial and viral pathogens. A handful of recent RNA structures and more numerous studies of the interaction of combinatorial libraries and oligomeric RNA-binding compounds are providing the foundation for effective RNA-targeted drug discovery programs.  相似文献   

7.
By its purest definition the ultimate goal of structural genomics (SG) is the determination of the structures of all proteins encoded by genomes. Most of these will be obtained by homology modeling using the structures of a set of target proteins for experimental determination. Thanks to the open exchange of SG target information, we are able to analyze the sequences of the current target list to evaluate the extent of its coverage of protein sequence space. The presence of homologous sequences currently either in the Protein Data Bank (PDB) or among SG targets has been determined for each of the protein sequences in several organisms. In this way we are able to evaluate the coverage by existing or targeted structural data for the non-membranous parts of entire proteomes. For small bacterial proteomes such as that of H. influenzae almost all proteins have homologous sequences among SG targets or in the PDB. There is significantly lower coverage for more complex organisms, such as C. elegans. We have mapped the SG target list onto the ProtoMap clustering of protein sequences. Clusters occupied by SG targets represent over 150,000 protein sequences, which is approximately 44% of the total protein sequences classified by ProtoMap. The mapping of SG targets also enables an evaluation of the degree of overlap within the target list. An SG target typically occupies a ProtoMap cluster with more than six other homologous targets.  相似文献   

8.
9.
10.
11.
The genus Campylobacter contains pathogens causing a wide range of diseases, targeting both humans and animals. Among them, the Campylobacter fetus subspecies fetus and venerealis deserve special attention, as they are the etiological agents of human bacterial gastroenteritis and bovine genital campylobacteriosis, respectively. We compare the whole genomes of both subspecies to get insights into genomic architecture, phylogenetic relationships, genome conservation and core virulence factors. Pan-genomic approach was applied to identify the core- and pan-genome for both C. fetus subspecies and members of the genus. The C. fetus subspecies conserved (76%) proteome were then analyzed for their subcellular localization and protein functions in biological processes. Furthermore, with pathogenomic strategies, unique candidate regions in the genomes and several potential core-virulence factors were identified. The potential candidate factors identified for attenuation and/or subunit vaccine development against C. fetus subspecies contain: nucleoside diphosphate kinase (Ndk), type IV secretion systems (T4SS), outer membrane proteins (OMP), substrate binding proteins CjaA and CjaC, surface array proteins, sap gene, and cytolethal distending toxin (CDT). Significantly, many of those genes were found in genomic regions with signals of horizontal gene transfer and, therefore, predicted as putative pathogenicity islands. We found CRISPR loci and dam genes in an island specific for C. fetus subsp. fetus, and T4SS and sap genes in an island specific for C. fetus subsp. venerealis. The genomic variations and potential core and unique virulence factors characterized in this study would lead to better insight into the species virulence and to more efficient use of the candidates for antibiotic, drug and vaccine development.  相似文献   

12.
Binding of short chain phosphatidylserine (C6PS) enhances the proteolytic activity of factor X(a) by 60-fold (Koppaka, V., Wang, J., Banerjee, M., and Lentz, B. R. (1996) Biochemistry 35, 7482-7491). In the present study, we locate three C6PS binding sites to different domains of factor X(a) using a combination of activity, circular dichroism, fluorescence, and equilibrium dialysis measurements on proteolytic and biosynthetic fragments of factor X(a). Our results demonstrate that the structural responses of human and bovine factor X(a) to C6PS binding are somewhat different. Despite this difference, data obtained with fragments from both human and bovine factor X(a) are consistent with a common hypothesis for the location of C6PS binding sites to different structural domains. First, the gamma-carboxyglutamic acid (Gla) domain binds C6PS only in the absence of Ca(2+) (k(d) approximately 1 mm), although this PS site does not influence the functional response of factor X(a). Second, a Ca(2+)-dependent binding site is in the epidermal growth factor domains (EGF(NC)) that are linked by Ca(2+) and C6PS binding to the Gla domain. This site appears to be the lipid regulatory site of factor X(a). Third, a Ca(2+)-requiring site seems to be in the EGF(C)-catalytic domain. This site appears not to be a lipid regulatory site but rather to share residues with the substrate recognition site. Finally, the full functional response to C6PS requires linkage of the Gla, EGF(NC), and catalytic domains in the presence of Ca(2+), meaning that PS regulation of factor X(a) involves linkage between widely separated parts of the protein.  相似文献   

13.
The objective of this study is to automatically identify regions of the human proteome that are suitable for 3D structure determination by X-ray crystallography and to annotate them according to their likelihood to produce diffraction quality crystals. The results provide a powerful tool for structural genomics laboratories who wish to select human proteins based on the statistical likelihood of crystallisation success. Combining fold recognition and crystallisation prediction algorithms enables the efficient calculation of the crystallisability of the entire human proteome. This novel study estimates that there are approximately 40,000 crystallisable regions in the human proteome. Currently, only 15% of these regions (approx. 6,000 sequences) have been solved to at least 95% sequence identity. The remaining unsolved regions have been categorised into 5 crystallisation classes and an integral membrane protein (IMP) class, based on established structure prediction, crystallisation prediction and transmembrane (TM) helix prediction algorithms. Approximately 750 unsolved regions (2% of the proteome) have been identified as having a PDB fold representative (template) and an ‘optimal’ likelihood of crystallisation. At the other end of the spectrum, more than 10,500 non-IMP regions with a PDB template are classified as ‘very difficult’ to crystallise (26%) and almost 2,500 regions (6%) were predicted to contain at least 3 TM helices. The 3D-SPECS (3D Structural Proteomics Explorer with Crystallisation Scores) website contains crystallisation predictions for the entire human proteome and can be found at .  相似文献   

14.

Background  

RNA-protein interactions are important for a wide range of biological processes. Current computational methods to predict interacting residues in RNA-protein interfaces predominately rely on sequence data. It is, however, known that interface residue propensity is closely correlated with structural properties. In this paper we systematically study information obtained from sequences and structures and compare their contributions in this prediction problem. Particularly, different geometrical and network topological properties of protein structures are evaluated to improve interface residue prediction accuracy.  相似文献   

15.
We describe here a systematic approach to the identification of human proteins and protein fragments that can be expressed as soluble proteins in Escherichia coli. A cDNA expression library of 10,825 clones was screened by small-scale expression and purification and 2,746 clones were identified. Sequence and protein-expression data were entered into a public database. A set of 163 clones was selected for structural analysis and 17 proteins were prepared for crystallization, leading to three new structures.  相似文献   

16.
We propose a method for constructing classifiers using logical combinations of elementary rules. The method is a form of rule-based classification, which has been widely discussed in the literature. In this work we focus specifically on issues that arise in the context of classifying cell samples based on RNA or protein expression measurements. The basic idea is to specify elementary rules that exhibit a locally strong pattern in favor of a single class. Strict admissibility criteria are imposed to produce a manageable universe of elementary rules. Then the elementary rules are combined using a set covering algorithm to form a composite rule that achieves a perfect fit to the training data. The user has explicit control over a parameter that determines the composite rule's level of redundancy and parsimony. This built-in control, along with the simplicity of interpreting the rules, makes the method particularly useful for classification problems in genomics. We demonstrate the new method using several microarray datasets and examine its generalization performance. We also draw comparisons to other machine-learning strategies such as CART, ID3, and C4.5.  相似文献   

17.
在后基因组时代,随着大量物种全基因组序列的获得,结构生物学家面临着结构基因组学的新机遇和挑战。与传统的结构生物学不同的是,结构基因组学的研究主要集中在结构和功能未知并且与从前研究的蛋白质相似性很小的蛋白质。准确的来讲,结构基因组学通过高通量蛋白质表达、结构解析来完成所有蛋白质家族的结构表征,从而能够通过结构预测功能。加州结构基因组学联合实验室发展了高度自动化的蛋白质合成、结晶、结构解析生产线。然而由于一些蛋白质不能被结晶,要想覆盖所有蛋白质结构域还有很大困难。Wuthrich的研究小组通过一些高通量的目的蛋白质筛选和NMR结构解析的方法解决了这一难题。与X射线晶体学解析蛋白质结构相比,NMR技术由于能够解析更接近生理状态的溶液结构而具有互补性。通过获得溶液中的蛋白质稳定性、动力学特征和相互作用信息,正如在朊蛋白和SARS相关蛋白的研究中所表现的那样,NMR技术从扩大已知的蛋白质结构数据库、新的蛋白质功能到化学生物学研究中都扮演着激动人心的角色。  相似文献   

18.
Amino acids do not occur randomly in proteins; rather, their occurrence at any given site is strongly influenced by the amino acid composition at other sites, the structural and functional aspects of the region of the protein in which they occur, and the evolutionary history of the protein. The goal of our research study is to identify networks of coevolving sites within the serpin proteins (serine protease inhibitors) and classify them as being caused by structural-functional constraints or by evolutionary history. To address this, a matrix of pairwise normalized mutual information (NMI) values was computed among amino acid sites for the serpin proteins. The NMI matrix was partitioned into orthogonal patterns of amino acid variability by factor analysis. Each common factor pattern was interpreted as having phylogenetic and/or structural-functional explanations. In addition, we used a bootstrap factor analysis technique to limit the effects of phylogenetic history on our factor patterns. Our results show an extensive network of correlations among amino acid sites in key functional regions (reactive center loop, shutter, and breach). Additionally, we have discovered long-range coevolution for packed amino acids within the serpin protein core. Lastly, we have discovered a group of serpin sites which coevolve in the hydrophobic core region (s5B and s4B) and appear to represent sites important for formation of the "native" instead of the "latent" serpin structure. This research provides a better understanding on how protein structure evolves; in particular, it elucidates the selective forces creating coevolution among protein sites.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号