首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Liu J  Rost B 《Proteins》2004,55(3):678-688
We developed a method CHOP dissecting proteins into domain-like fragments. The basic idea was to cut proteins beginning from very reliable experimental information (PDB), proceeding to expert annotations of domain-like regions (Pfam-A), and completing through cuts based on termini of known proteins. In this way, CHOP dissected more than two thirds of all proteins from 62 proteomes. Analysis of our structural domain-like fragments revealed four surprising results. First, >70% of all dissected proteins contained more than one fragment. Second, most domains spanned on average over approximately 100 residues. This average was similar for eukaryotic and prokaryotic proteins, and it is also valid-although previously not described-for all proteins in the PDB. Third, single-domain proteins were significant longer than most domains in multidomain proteins. Fourth, three fourths of all domains appeared shorter than 210 residues. We believe that our CHOP fragments constituted an important resource for functional and structural genomics. Nevertheless, our main motivation to develop CHOP was that the single-linkage clustering method failed to adequately group full-length proteins. In contrast, CLUP-the simple clustering scheme CLUP introduced here-succeeded largely to group the CHOP fragments from 62 proteomes such that all members of one cluster shared a basic structural core. CLUP found >63,000 multi- and >118,000 single-member clusters. Although most fragments were restricted to a particular cluster, approximately 24% of the fragments were duplicated in at least two clusters. Our thresholds for grouping two fragments into the same cluster were rather conservative. Nevertheless, our results suggested that structural genomics initiatives have to target >30,000 fragments to at least cover the multimember clusters in 62 proteomes.  相似文献   

2.
Liu J  Hegyi H  Acton TB  Montelione GT  Rost B 《Proteins》2004,56(2):188-200
A central goal of structural genomics is to experimentally determine representative structures for all protein families. At least 14 structural genomics pilot projects are currently investigating the feasibility of high-throughput structure determination; the National Institutes of Health funded nine of these in the United States. Initiatives differ in the particular subset of "all families" on which they focus. At the NorthEast Structural Genomics consortium (NESG), we target eukaryotic protein domain families. The automatic target selection procedure has three aims: 1) identify all protein domain families from currently five entirely sequenced eukaryotic target organisms based on their sequence homology, 2) discard those families that can be modeled on the basis of structural information already present in the PDB, and 3) target representatives of the remaining families for structure determination. To guarantee that all members of one family share a common foldlike region, we had to begin by dissecting proteins into structural domain-like regions before clustering. Our hierarchical approach, CHOP, utilizing homology to PrISM, Pfam-A, and SWISS-PROT chopped the 103,796 eukaryotic proteins/ORFs into 247,222 fragments. Of these fragments, 122,999 appeared suitable targets that were grouped into >27,000 singletons and >18,000 multifragment clusters. Thus, our results suggested that it might be necessary to determine >40,000 structures to minimally cover the subset of five eukaryotic proteomes.  相似文献   

3.
Zhao XM  Wang Y  Chen L  Aihara K 《Proteins》2008,72(1):461-473
Domains are structural and functional units of proteins and play an important role in functional genomics. Theoretically, the functions of a protein can be directly inferred if the biological functions of its component domains are determined. Despite the important role that domains play, only a small number of domains have been annotated so far, and few works have been performed to predict the functions of domains. Hence, it is necessary to develop automatic methods for predicting domain functions based on various available data. In this article, two new methods, that is, the threshold-based classification method and the support vector machines method, are proposed for protein domain function prediction by integrating heterogeneous information sources, including protein-domain mapping features, domain-domain interactions, and domain coexisting features. We show that the integration of heterogeneous information sources improves not only prediction accuracy but also annotation reliability when compared with the methods using only individual information sources.  相似文献   

4.
The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recent developments of local structure-based and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, protein function prediction and extended similarity group, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures.  相似文献   

5.
蛋白质折叠类型分类是蛋白质分类研究的重要内容。以SCOP数据库中的 PH domain-like barrel 折叠类型为研究对象,选择序列相似度小于25%的61个样本为检验集,通过结构特征分析,确定了该折叠类型的模板及其对应的特征参数,利用模板与待测蛋白的空间结构比对信息,提出了一个新的折叠类型打分函数Fscore,建立了基于Fscore的蛋白质折叠类型分类方法并用于该折叠类型的分类。用此方法对Astral1.75中序列相似度小于95%的16711个样本进行检验,分类结果的特异性为99.97%。结果表明:特征参数抓住了折叠类型的本质,打分函数Fscore及基于Fscore建立的分类方法可用于 PH domain-like barrel 蛋白质折叠类型自动分类。  相似文献   

6.

Background  

Inference of remote homology between proteins is very challenging and remains a prerogative of an expert. Thus a significant drawback to the use of evolutionary-based protein structure classifications is the difficulty in assigning new proteins to unique positions in the classification scheme with automatic methods. To address this issue, we have developed an algorithm to map protein domains to an existing structural classification scheme and have applied it to the SCOP database.  相似文献   

7.
8.
Gamete recognition and adhesion are essential steps in the complex process of fertilization. In mammals and in other species, increasing evidence indicates that carbohydrate-binding proteins on the sperm surface play a pivotal role as counter-receptors for certain oligosaccharide moieties attached to the oocyte zona pellucida glycoproteins. Although different sperm-associated zona-pellucida-binding proteins have been identified in a number of species, few of them have been isolated and structurally characterized. In this paper we report the primary structural characterization of AQN-1, a 12-kDa boar-sperm-associated carbohydrate-binding and zona-pellucida-binding protein. The molecular mass of AQN-1 was determined by time-of-flight plasma-desorption mass spectrometry. Determination of its amino acid sequence and location of disulphide bridges were accomplished by a combination of proteochemical and mass spectrometric methods. The primary structure of AQN-1 failed to show any significant similarity to the protein structures deposited with the Martinsried Institute for Protein Sequences data bank, indicating that it may belong to a novel protein family involved in fertilization. AQN-1 shares extensive structural, as well as functional, similarity with two other boar sperm zona-pellucida-binding proteins, AQN-3 and AWN, which we have recently characterized. To name this protein family, we have coined the term spermadhesin. Our data may be relevant for identification of spermadhesins in other species, and thus may contribute to a better understanding of the species-specific sperm-egg recognition mechanism.  相似文献   

9.
The classic structure–function paradigm holds that a protein exhibits a single well-defined native state that gives rise to its biological function. Nonetheless, over the past few decades, numerous examples of proteins exhibiting biological function arising from multiple structural states of varying disorder have been identified. Most recently, several examples of ‘metamorphic proteins’, able to interconvert between vastly different native-like topologies under physiological conditions, have been characterised with multiple functions. In this review, we look at the concept of protein metamorphosis in relation to the current understanding of the protein structure–function landscape. Although structural dynamism observed for metamorphic proteins provides a novel source of functional versatility, the dynamic nature of the metamorphic proteins generally makes them difficult to identify and probe using conventional protein structure determination methods. However, as the existence of metamorphic proteins has now been established and techniques enabling the analysis of multiple protein conformers are improving, it is likely that this class will continue to grow in number.  相似文献   

10.
Goyal K  Mande SC 《Proteins》2008,70(4):1206-1218
High throughput structural genomics efforts have been making the structures of proteins available even before their function has been fully characterized. Therefore, methods that exploit the structural knowledge to provide evidence about the functions of proteins would be useful. Such methods would be needed to complement the sequence-based function annotation approaches. The current study describes generation of 3D-structural motifs for metal-binding sites from the known metalloproteins. It then scans all the available protein structures in the PDB database for putative metal-binding sites. Our analysis predicted more than 1000 novel metal-binding sites in proteins using three-residue templates, and more than 150 novel metal-binding sites using four-residue templates. Prediction of metal-binding site in a yeast protein YDR533c led to the hypothesis that it might function as metal-dependent amidopeptidase. The structural motifs identified by our method present novel metal-binding sites that reveal newer mechanisms for a few well-known proteins.  相似文献   

11.
Chemokines are small secreted proteins with important roles in immune responses. They consist of a conserved three-dimensional (3D) structure, so-called IL8-like chemokine fold, which is supported by disulfide bridges characteristic of this protein family. Sequence- and profile-based computational methods have been proficient in discovering novel chemokines by making use of their sequence-conserved cysteine patterns. However, it has been recently shown that some chemokines escaped annotation by these methods due to low sequence similarity to known chemokines and to different arrangement of cysteines in sequence and in 3D. Innovative methods overcoming the limitations of current techniques may allow the discovery of new remote homologs in the still functionally uncharacterized fraction of the human genome. We report a novel computational approach for proteome-wide identification of remote homologs of the chemokine family that uses fold recognition techniques in combination with a scaffold-based automatic mapping of disulfide bonds to define a 3D profile of the chemokine protein family. By applying our methodology to all currently uncharacterized human protein sequences, we have discovered two novel proteins that, without having significant sequence similarity to known chemokines or characteristic cysteine patterns, show strong structural resemblance to known anti-HIV chemokines. Detailed computational analysis and experimental structural investigations based on mass spectrometry and circular dichroism support our structural predictions and highlight several other chemokine-like features. The results obtained support their functional annotation as putative novel chemokines and encourage further experimental characterization. The identification of remote homologs of human chemokines may provide new insights into the molecular mechanisms causing pathologies such as cancer or AIDS, and may contribute to the development of novel treatments. Besides, the genome-wide applicability of our methodology based on 3D protein family profiles may open up new possibilities for improving and accelerating protein function annotation processes.  相似文献   

12.
Uncontrolled protein aggregation is a constant challenge in all compartments of living organisms. The failure of a peptide or protein to remain soluble often results in pathology. So far, more than 40 human diseases have been associated with the formation of extracellular fibrillar aggregates - known as amyloid fibrils - or structurally related intracellular deposits. It is well known that molecular chaperones and elaborate quality control mechanisms exist in the cell to counteract aggregation. However, an increasing number of reports during the past few years indicate that proteins have also evolved structural and sequence-based strategies to prevent aggregation. This review describes these strategies and the selection pressures that exist on protein sequences to combat their uncontrolled aggregation. We will describe the different types of mechanism evolved by proteins that adopt different conformational states including normally folded proteins, intrinsically disordered polypeptide chains, elastomeric systems and multimodular proteins.  相似文献   

13.
The structural comparison of two proteins comes up in many applications in structural biology where it is often necessary to find similarities in very large conformation sets. This work describes techniques to achieve significant speedup in the computation of structural similarity between two given conformations, at the expense of introducing a small error in the similarity measure. Furthermore, the proposed computational scheme allows for a tradeoff between speedup and error. This scheme exploits the fact that the Calpha representation of a protein conformation contains redundant information, due to the chain topology and limited compactness of proteins. This redundancy can be reduced by approximating subchains of a protein by their centers of mass, resulting in a smaller number of points to describe a conformation. A Haar wavelet analysis of random chains and proteins is used to justify this approximated representation. Similarity measures computed with this representation are highly correlated to the measures computed with the original Calpha representation. Therefore, they can be used in applications where small similarity errors can be tolerated or as fast filters in applications that require exact measures. Computational tests have been conducted on two applications, nearest neighbor search and automatic structural classification.  相似文献   

14.
Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E: value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation.  相似文献   

15.
Interaction-site prediction for protein complexes: a critical assessment   总被引:2,自引:0,他引:2  
MOTIVATION: Proteins function through interactions with other proteins and biomolecules. Protein-protein interfaces hold key information toward molecular understanding of protein function. In the past few years, there have been intensive efforts in developing methods for predicting protein interface residues. A review that presents the current status of interface prediction and an overview of its applications and project future developments is in order. SUMMARY: Interface prediction methods rely on a wide range of sequence, structural and physical attributes that distinguish interface residues from non-interface surface residues. The input data are manipulated into either a numerical value or a probability representing the potential for a residue to be inside a protein interface. Predictions are now satisfactory for complex-forming proteins that are well represented in the Protein Data Bank, but less so for under-represented ones. Future developments will be directed at tackling problems such as building structural models for multi-component structural complexes.  相似文献   

16.
Structural genomics efforts have led to increasing numbers of novel, uncharacterized protein structures with low sequence identity to known proteins, resulting in a growing need for structure-based function recognition tools. Our method, SeqFEATURE, robustly models protein functions described by sequence motifs using a structural representation. We built a library of models that shows good performance compared to other methods. In particular, SeqFEATURE demonstrates significant improvement over other methods when sequence and structural similarity are low.  相似文献   

17.
Recent years have seen the establishment of structural genomics centers that explicitly target integral membrane proteins. Here, we review the advances in targeting these extremely high-hanging fruits of structural biology in high-throughput mode. We observe that the experimental determination of high-resolution structures of integral membrane proteins is increasingly successful both in terms of getting structures and of covering important protein families, for example, from Pfam. Structural genomics has begun to contribute significantly toward this progress. An important component of this contribution is the set up of robotic pipelines that generate a wealth of experimental data for membrane proteins. We argue that prediction methods for the identification of membrane regions and for the comparison of membrane proteins largely suffice to meet the challenges of target selection for structural genomics of membrane proteins. In contrast, we need better methods to prioritize the most promising members in a family of closely related proteins and to annotate protein function from sequence and structure in absence of homology.  相似文献   

18.
The dielectric properties of proteins are poorly understood and difficult to describe quantitatively. This limits the accuracy of methods for structure-based calculation of electrostatic energies and pK(a) values. The pK(a) values of many internal groups report apparent protein dielectric constants of 10 or higher. These values are substantially higher than the dielectric constants of 2-4 measured experimentally with dry proteins. The structural origins of these high apparent dielectric constants are not well understood. Here we report on structural and equilibrium thermodynamic studies of the effects of pH on the V66D variant of staphylococcal nuclease. In a crystal structure of this protein the neutral side chain of Asp-66 is buried in the hydrophobic core of the protein and hydrated by internal water molecules. Asp-66 titrates with a pK(a) value near 9. A decrease in the far UV-CD signal was observed, concomitant with ionization of this aspartic acid, and consistent with the loss of 1.5 turns of alpha-helix. These data suggest that the protein dielectric constant needed to reproduce the pK(a) value of Asp-66 with continuum electrostatics calculations is high because the dielectric constant has to capture, implicitly, the energetic consequences of the structural reorganization that are not treated explicitly in continuum calculations with static structures.  相似文献   

19.
BACKGROUND: In recent years, the determination of large numbers of protein structures has created a need for automatic and objective methods for the comparison of structures or conformations. Many protein structures show similarities of conformation that are undetectable by comparing their sequences. Comparison of structures can reveal similarities between proteins thought to be unrelated, providing new insight into the interrelationships of sequence, structure and function. RESULTS: Using a new tool that we have developed to perform rapid structural alignment, we present the highlights of an exhaustive comparison of all pairs of protein structures in the Brookhaven protein database. Notably, we find that the DNA-binding domain of the bacteriophage repressor family is almost completely embedded in the larger eight-helix fold of the globin family of proteins. The significant match of specific residues is correlated with functional, structural and evolutionary information. CONCLUSION: Our method can help to identify structurally similar folds rapidly and with high-sensitivity, providing a powerful tool for analyzing the ever-increasing number of protein structures being elucidated.  相似文献   

20.
MOTIVATION: Although many methods are available for the identification of structural domains from protein three-dimensional structures, accurate definition of protein domains and the curation of such data for a large number of proteins are often possible only after manual intervention. The availability of domain definitions for protein structural entries is useful for the sequence analysis of aligned domains, structure comparison, fold recognition procedures and understanding protein folding, domain stability and flexibility. RESULTS: We have improved our method of domain identification starting from the concept of clustering secondary structural elements, but with an intention of reducing the number of discontinuous segments in identified domains. The results of our modified and automatic approach have been compared with the domain definitions from other databases. On a test data set of 55 proteins, this method acquires high agreement (88%) in the number of domains with the crystallographers' definition and resources such as SCOP, CATH, DALI, 3Dee and PDP databases. This method also obtains 98% overlap score with the other resources in the definition of domain boundaries of the 55 proteins. We have examined the domain arrangements of 4592 non-redundant protein chains using the improved method to include 5409 domains leading to an update of the structural domain database. AVAILABILITY: The latest version of the domain database and online domain identification methods are available from http://www.ncbs.res.in/~faculty/mini/ddbase/ddbase.html Supplementary information: http://www.ncbs.res.in/~faculty/mini/ddbase/supplementary/supplementary.html  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号