首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The dramatically increasing number of new protein sequences arising from genomics 4 proteomics requires the need for methods to rapidly and reliably infer the molecular and cellular functions of these proteins. One such approach, structural genomics, aims to delineate the total repertoire of protein folds in nature, thereby providing three-dimensional folding patterns for all proteins and to infer molecular functions of the proteins based on the combined information of structures and sequences. The goal of obtaining protein structures on a genomic scale has motivated the development of high throughput technologies and protocols for macromolecular structure determination that have begun to produce structures at a greater rate than previously possible. These new structures have revealed many unexpected functional inferences and evolutionary relationships that were hidden at the sequence level. Here, we present samples of structures determined at Berkeley Structural Genomics Center and collaborators laboratories to illustrate how structural information provides and complements sequence information to deduce the functional inferences of proteins with unknown molecular functions.Two of the major premises of structural genomics are to discover a complete repertoire of protein folds in nature and to find molecular functions of the proteins whose functions are not predicted from sequence comparison alone. To achieve these objectives on a genomic scale, new methods, protocols, and technologies need to be developed by multi-institutional collaborations worldwide. As part of this effort, the Protein Structure Initiative has been launched in the United States (PSI; www.nigms.nih.gov/funding/psi.html). Although infrastructure building and technology development are still the main focus of structural genomics programs [1–6], a considerable number of protein structures have already been produced, some of them coming directly out of semi-automated structure determination pipelines [6–10]. The Berkeley Structural Genomics Center (BSGC) has focused on the proteins of Mycoplasma or their homologues from other organisms as its structural genomics targets because of the minimal genome size of the Mycoplasmas as well as their relevance to human and animal pathogenicity (http://www.strgen.org). Here we present several protein examples encompassing a spectrum of functional inferences obtainable from their three-dimensional structures in five situations, where the inferences are new and testable, and are not predictable from protein sequence information alone.  相似文献   

2.
Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community.  相似文献   

3.
MOTIVATION: Discriminating outer membrane proteins from other folding types of globular and membrane proteins is an important task both for identifying outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary structures. RESULTS: We have systematically analyzed the amino acid composition of globular proteins from different structural classes and outer membrane proteins. We found that the residues, Glu, His, Ile, Cys, Gln, Asn and Ser, show a significant difference between globular and outer membrane proteins. Based on this information, we have devised a statistical method for discriminating outer membrane proteins from other globular and membrane proteins. Our approach correctly picked up the outer membrane proteins with an accuracy of 89% for the training set of 337 proteins. On the other hand, our method has correctly excluded the globular proteins at an accuracy of 79% in a non-redundant dataset of 674 proteins. Furthermore, the present method is able to correctly exclude alpha-helical membrane proteins up to an accuracy of 80%. These accuracy levels are comparable to other methods in the literature, and this is a simple method, which could be used for dissecting outer membrane proteins from genomic sequences. The influence of protein size, structural class and specific residues for discrimination is discussed.  相似文献   

4.
5.
To facilitate swift structural characterizations, structural genomic/proteomic projects need to divide large multi-domain proteins into structural domains and to determine their structures separately. Thus, the assignment of structural domains based solely on sequence information, especially on the physico-chemical properties of the amino acid sequences, could be very helpful for such projects. In this study, we examined the characteristics of domain linker sequences, which are loop sequences connecting two structural domains. To this end, we prepared a set of 101 non-redundant multi-domain protein sequences with known structures, and performed an analysis of the linker sequences. The analysis revealed that the frequencies of five (Pro, Gly, Asp, Asn, Lys) amino acid residues differed significantly between the linker and non-linker loop sequences. Moreover, we observed a similar deviation for the residue pair frequencies between the two types of loop sequences. Finally, we describe an automated method, based on the above analysis, to detect loops that have high probabilities of being domain linkers in a protein sequence.  相似文献   

6.
Genomic DNA in eukaryotes is organized into chromatin through association with core histone proteins to form nucleosomes. To understand the structure and function of chromatin, we must determine the structures of nucleosomes containing native DNA sequences. However, to date, our knowledge of nucleosome structures is mainly based on the crystallographic studies of the nucleosomes containing non-native DNA sequences. Here, we discuss the technical issues related to the determination of the nucleosome structures and review the few structural studies on native-like nucleosomes. We show how an antibody fragment-aided single-particle cryo-EM can be a useful method to determine the structures of nucleosomes containing genomic DNA. Finally, we provide a perspective for future structural studies of some native-like nucleosomes that play critical roles in chromatin functions.  相似文献   

7.
Detailed knowledge of the three-dimensional structures of biological molecules has had an enormous impact on all areas of biological science, including genetics, as structure can reveal the fine details of how molecules perform their biological functions. Here we consider how changes in protein sequence affect the corresponding 3D structure, and describe how structural information about proteins, DNA and chromatin has shed light on gene regulatory mechanisms and the storage and transmission of epigenetic information. Finally, we describe how structure determination is benefiting from the high-throughput technologies of the worldwide structural genomics projects.  相似文献   

8.
How can we make the connection between the three-dimensional structures of individual proteins and understanding how complex biological systems involving many proteins work? The modelling and simulation of protein structures can help to answer this question for systems ranging from multimacromolecular complexes to organelles and cells. On one hand, multiscale modelling and simulation techniques are advancing to permit the spatial and temporal properties of large systems to be simulated using atomic-detail structures. On the other hand, the estimation of kinetic parameters for the mathematical modelling of biochemical pathways using protein structure information provides a basis for iterative manipulation of biochemical pathways guided by protein structure. Recent advances include the structural modelling of protein complexes on the genomic level, novel coarse-graining strategies to increase the size of the system and the time span that can be simulated, and comparative molecular field analyses to estimate enzyme kinetic parameters.  相似文献   

9.
The tetratricopeptide repeat (TPR) is a 34-amino acid alpha-helical motif that occurs in over 300 different proteins. In the different proteins, three to sixteen or more TPR motifs occur in tandem arrays and function to mediate protein-protein interactions. The binding specificity of each TPR protein is different, although the underlying structural motif is the same. Here we describe a statistical approach to the design of an idealized TPR motif. We present the high-resolution X-ray crystal structures (to 1.55 and 1.6 A) of designed TPR proteins and describe their solution properties and stability. A detailed analysis of these structures provides an understanding of the TPR motif, how it is repeated to give helical arrays with different superhelical twists, and how a very stable framework may be constructed for future functional designs.  相似文献   

10.
Life in the fast lane for protein crystallization and X-ray crystallography   总被引:3,自引:0,他引:3  
The common goal for structural genomic centers and consortiums is to decipher as quickly as possible the three-dimensional structures for a multitude of recombinant proteins derived from known genomic sequences. Since X-ray crystallography is the foremost method to acquire atomic resolution for macromolecules, the limiting step is obtaining protein crystals that can be useful of structure determination. High-throughput methods have been developed in recent years to clone, express, purify, crystallize and determine the three-dimensional structure of a protein gene product rapidly using automated devices, commercialized kits and consolidated protocols. However, the average number of protein structures obtained for most structural genomic groups has been very low compared to the total number of proteins purified. As more entire genomic sequences are obtained for different organisms from the three kingdoms of life, only the proteins that can be crystallized and whose structures can be obtained easily are studied. Consequently, an astonishing number of genomic proteins remain unexamined. In the era of high-throughput processes, traditional methods in molecular biology, protein chemistry and crystallization are eclipsed by automation and pipeline practices. The necessity for high-rate production of protein crystals and structures has prevented the usage of more intellectual strategies and creative approaches in experimental executions. Fundamental principles and personal experiences in protein chemistry and crystallization are minimally exploited only to obtain “low-hanging fruit” protein structures. We review the practical aspects of today's high-throughput manipulations and discuss the challenges in fast pace protein crystallization and tools for crystallography. Structural genomic pipelines can be improved with information gained from low-throughput tactics that may help us reach the higher-bearing fruits. Examples of recent developments in this area are reported from the efforts of the Southeast Collaboratory for Structural Genomics (SECSG).  相似文献   

11.
The field of structural biology is becoming increasingly important as new technological developments facilitate the collection of data on the atomic structures of proteins and nucleic acids. The solid-state NMR method is a relatively new biophysical technique that holds particular promise for determining the structures of peptides and proteins that are located within the cell membrane. This method provides information on the orientation of the peptide planes relative to an external magnetic field. In this article, we discuss some of the mathematical methods and tools that are useful in deriving the atomic structure from these orientational data. We first discuss how the data are viewed as tensors, and how these tensors can be used to construct an initial atomic model, assuming ideal stereochemistry. We then discuss methods for refining the models using global optimization, with stereochemistry constraints treated as penalty functions. These two processes, initial model building followed by refinement, are the two crucial steps between data collection and the final atomic model.  相似文献   

12.
Membrane proteins control a large number of vital biological processes and are often medically important—not least as drug targets. However, membrane proteins are generally more difficult to work with than their globular counterparts, and as a consequence comparatively few high‐resolution structures are available. In any membrane protein structure project, a lot of effort is usually spent on obtaining a pure and stable protein preparation. The process commonly involves the expression of several constructs and homologs, followed by extraction in various detergents. This is normally a time‐consuming and highly iterative process since only one or a few conditions can be tested at a time. In this article, we describe a rapid screening protocol in a 96‐well format that largely mimics standard membrane protein purification procedures, but eliminates the ultracentrifugation and membrane preparation steps. Moreover, we show that the results are robustly translatable to large‐scale production of detergent‐solubilized protein for structural studies. We have applied this protocol to 60 proteins from an E. coli membrane protein library, in order to find the optimal expression, solubilization and purification conditions for each protein. With guidance from the obtained screening data, we have also performed successful large‐scale purifications of several of the proteins. The protocol provides a rapid, low cost solution to one of the major bottlenecks in structural biology, making membrane protein structures attainable even for the small laboratory.  相似文献   

13.
The ability to maintain intact ribosomes in the mass spectrometer has enabled research into their changes in conformation and interactions. In the mass spectrometer, it is possible to induce dissociation of proteins from the intact ribosome and, in conjunction with atomic structures, to understand the factors governing their release. We have applied this knowledge to interpret the structural basis for release of proteins from ribosomes for which no high resolution structures are available, such as complexes with elongation factor G and ribosomes from yeast. We also describe how improvements in technology and understanding have widened the scope of our research and lead to dramatic improvements in quality and information available from spectra of intact ribosomes.  相似文献   

14.
Structural genomic projects envision almost routine protein structure determinations, which are currently imaginable only for small proteins with molecular weights below 25,000 Da. For larger proteins, structural insight can be obtained by breaking them into small segments of amino acid sequences that can fold into native structures, even when isolated from the rest of the protein. Such segments are autonomously folding units (AFU) and have sizes suitable for fast structural analyses. Here, we propose to expand an intuitive procedure often employed for identifying biologically important domains to an automatic method for detecting putative folded protein fragments. The procedure is based on the recognition that large proteins can be regarded as a combination of independent domains conserved among diverse organisms. We thus have developed a program that reorganizes the output of BLAST searches and detects regions with a large number of similar sequences. To automate the detection process, it is reduced to a simple geometrical problem of recognizing rectangular shaped elevations in a graph that plots the number of similar sequences at each residue of a query sequence. We used our program to quantitatively corroborate the premise that segments with conserved sequences correspond to domains that fold into native structures. We applied our program to a test data set composed of 99 amino acid sequences containing 150 segments with structures listed in the Protein Data Bank, and thus known to fold into native structures. Overall, the fragments identified by our program have an almost 50% probability of forming a native structure, and comparable results are observed with sequences containing domain linkers classified in SCOP. Furthermore, we verified that our program identifies AFU in libraries from various organisms, and we found a significant number of AFU candidates for structural analysis, covering an estimated 5 to 20% of the genomic databases. Altogether, these results argue that methods based on sequence similarity can be useful for dissecting large proteins into small autonomously folding domains, and such methods may provide an efficient support to structural genomics projects.  相似文献   

15.
We describe several algorithms and public servers that were developed to analyze and predict various features of protein structures. These servers provide information about the covalent state of cysteine (CYSREDOX), as well as about residues involved in non-covalent cross links that play an important role in the structural stability of proteins (SCIDE and SCPRED). We also discuss methods and servers developed to identify helical transmembrane proteins from large databases and rough genomic data, including two of the most popular transmembrane prediction methods, DAS and HMMTOP. Several biologically interesting applications of these servers are also presented. The servers are available through http://www.enzim.hu/servers.html.  相似文献   

16.
Reproductive proteins maintain species‐specific barriers to fertilization, affect the outcome of sperm competition, mediate reproductive conflicts between the sexes, and potentially contribute to the formation of new species. However, the specific proteins and molecular mechanisms that underlie these processes are understood in only a handful of cases. Advances in genomic and proteomic technologies enable the identification of large suites of reproductive proteins, making it possible to dissect reproductive phenotypes at the molecular level. We first review these technological advances and describe how reproductive proteins are identified in diverse animal taxa. We then discuss the dynamic evolution of reproductive proteins and the potential selective forces that act on them. Finally, we describe molecular and genomic tools for functional analysis and detail how evolutionary data may be used to make predictions about interactions among reproductive proteins.  相似文献   

17.
H/ACA RNA-protein complexes, comprised of four proteins and an H/ACA guide RNA, modify ribosomal and small nuclear RNAs. The H/ACA proteins are also essential components of telomerase in mammals. Cbf5 is the H/ACA protein that catalyzes isomerization of uridine to pseudouridine in target RNAs. Mutations in human Cbf5 (dyskerin) lead to dyskeratosis congenita. Here, we describe the 2.1 A crystal structure of a specific complex of three archaeal H/ACA proteins, Cbf5, Nop10, and Gar1. Cbf5 displays structural properties that are unique among known pseudouridine synthases and are consistent with its distinct function in RNA-guided pseudouridylation. We also describe the previously unknown structures of both Nop10 and Gar1 and the structural basis for their essential roles in pseudouridylation. By using information from related structures, we have modeled the entire ribonucleoprotein complex including both guide and substrate RNAs. We have also identified a dyskeratosis congenita mutation cluster site within a modeled dyskerin structure.  相似文献   

18.
Intrinsically disordered proteins (IDPs) defy the structure-function paradigm as they fulfill essential biological functions while lacking well-defined secondary and tertiary structures. Conformational and spectroscopic analyses showed that IDPs do not constitute a uniform family, and can be divided into subfamilies as a function of their residual structure content. Residual intramolecular interactions are thought to facilitate binding to a partner and then induced folding. Comprehensive information about experimental approaches to investigate structural disorder and induced folding is still scarce. We herein provide hints to readily recognize features typical of intrinsic disorder and review the principal techniques to assess structural disorder and induced folding. We describe their theoretical principles and discuss their respective advantages and limitations. Finally, we point out the necessity of using different approaches and show how information can be broadened by the use of multiples techniques.  相似文献   

19.
Innate immune responses, such as cell death and inflammatory signaling, are typically switch-like in nature. They also involve “prion-like” self-templating polymerization of one or more signaling proteins into massive macromolecular assemblies known as signalosomes. Despite the wealth of atomic-resolution structural information on signalosomes, how the constituent polymers nucleate and whether the switch-like nature of that event at the molecular scale relates to the digital nature of innate immune signaling at the cellular scale remains unknown. In this perspective, we review current knowledge of innate immune signalosome assembly, with an emphasis on structural constraints that allow the proteins to accumulate in inactive soluble forms poised for abrupt polymerization. We propose that structurally encoded nucleation barriers to protein polymerization kinetically regulate the corresponding pathways, which allows for extremely sensitive, rapid, and decisive signaling upon pathogen detection. We discuss how nucleation barriers satisfy the rigorous on-demand functions of the innate immune system but also predispose the system to precocious activation that may contribute to progressive age-associated inflammation.  相似文献   

20.
Given the massive increase in the number of new sequences and structures, a critical problem is how to integrate these raw data into meaningful biological information. One approach, the Evolutionary Trace, or ET, uses phylogenetic information to rank the residues in a protein sequence by evolutionary importance and then maps those ranked at the top onto a representative structure. If these residues form structural clusters, they can identify functional surfaces such as those involved in molecular recognition. Now that a number of examples have shown that ET can identify binding sites and focus mutational studies on their relevant functional determinants, we ask whether the method can be improved so as to be applicable on a large scale. To address this question, we introduce a new treatment of gaps resulting from insertions and deletions, which streamlines the selection of sequences used as input. We also introduce objective statistics to assess the significance of the total number of clusters and of the size of the largest one. As a result of the novel treatment of gaps, ET performance improves measurably. We find evolutionarily privileged clusters that are significant at the 5% level in 45 out of 46 (98%) proteins drawn from a variety of structural classes and biological functions. In 37 of the 38 proteins for which a protein-ligand complex is available, the dominant cluster contacts the ligand. We conclude that spatial clustering of evolutionarily important residues is a general phenomenon, consistent with the cooperative nature of residues that determine structure and function. In practice, these results suggest that ET can be applied on a large scale to identify functional sites in a significant fraction of the structures in the protein databank (PDB). This approach to combining raw sequences and structure to obtain detailed insights into the molecular basis of function should prove valuable in the context of the Structural Genomics Initiative.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号