首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2) database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR) protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function.

Findings

We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG) to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated.

Conclusions

CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores for false positives by ~30%, while leaving true positives unaffected. Importantly, receiver operating characteristics (ROC) curves demonstrate the high correlation between CPASS similarity scores and an accurate functional assignment. As indicated by distribution curves, scores ≥ 30% infer a functional similarity. Software URL: http://cpass.unl.edu.  相似文献   

2.

Background

Functional similarity is challenging to identify when global sequence and structure similarity is low. Active-sites or functionally relevant regions are evolutionarily more stable relative to the remainder of a protein structure and provide an alternative means to identify potential functional similarity between proteins. We recently developed the FAST-NMR methodology to discover biochemical functions or functional hypotheses of proteins of unknown function by experimentally identifying ligand binding sites. FAST-NMR utilizes our CPASS software and database to assign a function based on a similarity in the structure and sequence of ligand binding sites between proteins of known and unknown function.

Methodology/Principal Findings

The PrgI protein from Salmonella typhimurium forms the needle complex in the type III secretion system (T3SS). A FAST-NMR screen identified a similarity between the ligand binding sites of PrgI and the Bcl-2 apoptosis protein Bcl-xL. These ligand binding sites correlate with known protein-protein binding interfaces required for oligomerization. Both proteins form membrane pores through this oligomerization to release effector proteins to stimulate cell death. Structural analysis indicates an overlap between the PrgI structure and the pore forming motif of Bcl-xL. A sequence alignment indicates conservation between the PrgI and Bcl-xL ligand binding sites and pore formation regions. This active-site similarity was then used to verify that chelerythrine, a known Bcl-xL inhibitor, also binds PrgI.

Conclusions/Significance

A structural and functional relationship between the bacterial T3SS and eukaryotic apoptosis was identified using our FAST-NMR ligand affinity screen in combination with a bioinformatic analysis based on our CPASS program. A similarity between PrgI and Bcl-xL is not readily apparent using traditional global sequence and structure analysis, but was only identified because of conservation in ligand binding sites. These results demonstrate the unique opportunity that ligand-binding sites provide for the identification of functional relationships when global sequence and structural information is limited.  相似文献   

3.
Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active‐site structural similarities has not yet been undertaken. Pyridoxal‐5′‐phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the comparison of protein active site structures (CPASS) software and database, we show that the active site structures of PLP‐dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three‐dimensional‐fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. Proteins 2014; 82:2597–2608. © 2014 Wiley Periodicals, Inc.  相似文献   

4.
The functional evolution of proteins advances through gene duplication followed by functional drift, whereas molecular evolution occurs through random mutational events. Over time, protein active-site structures or functional epitopes remain highly conserved, which enables relationships to be inferred between distant orthologs or paralogs. In this study, we present the first functional clustering and evolutionary analysis of the RCSB Protein Data Bank (RCSB PDB) based on similarities between active-site structures. All of the ligand-bound proteins within the RCSB PDB were scored using our Comparison of Protein Active-site Structures (CPASS) software and database ( http://cpass.unl.edu/ ). Principal component analysis was then used to identify 4431 representative structures to construct a phylogenetic tree based on the CPASS comparative scores ( http://itol.embl.de/shared/jcatazaro ). The resulting phylogenetic tree identified a sequential, step-wise evolution of protein active-sites and provides novel insights into the emergence of protein function or changes in substrate specificity based on subtle changes in geometry and amino acid composition.  相似文献   

5.
6.
7.
The most abundant root proteins of ginseng (Panax ginseng) have been detected and identified by comparative proteome analysis with cultured hairy root of ginseng. Four abundant proteins (28, 26, 21 and 20 kDa) of P. ginseng had isoforms with different pl values on two-dimensional gel electrophoresis (2DE). The results of N-terminal and internal amino acid sequencing, however, showed that all of them originate from a 28 kDa protein, known as ginseng major protein (GMP). The GMP gene was searched for in the expressed sequence tag database of P. ginseng and found to encode a 27.3 kDa protein having 238 amino acid residues. Analysis of the amino acid sequences indicates that GMP exhibits high sequence homology with plant RNases and RNase-like proteins. However, purified GMP had no RNase activity even though it has conserved amino acid residues known to be essential for active sites of RNase. The GMPs present in ginseng main root were not expressed in cultured hairy roots of ginseng. 2DE analysis showed that the amounts of GMPs in main roots change according to seasonal fluctuation. These results suggest that the GMPs are root-specific RNase-like proteins, which function as vegetative storage proteins of ginseng for survival in the natural environment.  相似文献   

8.
Mass spectrometry was used in conjunction with gel electrophoresis and liquid chromatography, to determine peptide sequences from American alligator (Alligator mississippiensis) leukocytes and to identify similar proteins based on homology. The goal of the study was to generate an initial database of proteins related to the alligator immune system. We have adopted a typical proteomics approach for this study. Proteins from leukocyte extracts were separated using two-dimensional gel electrophoresis and the major bands were excised, digested and analyzed by on-line nano-LC MS/MS to generate peptide sequences. The sequences generated were used to identify proteins and characterize their functions. The protein identity and characterization of the protein function were based on matching two or more peptides to the same protein by searching against the NCBI database using MASCOT and Basic Local Alignment Search Tool (BLAST). For those proteins with only one peptide matching, the phylum of the matched protein was considered. Forty-three proteins were identified that exhibit sequence similarities to proteins from other vertebrates. Proteins related to the cytoskeletal system were the most abundant proteins identified. These proteins are known to regulate cell mobility and phagocytosis. Several other peptides were matched to proteins that potentially have immune-related function.  相似文献   

9.
10.
The ability to identify the functional correlates of structural and sequence variation in proteins is a critical capability. We related structures of influenza A N10 and N11 proteins that have no established function to structures of proteins with known function by identifying spatially conserved atoms. We identified atoms with common distributed spatial occupancy in PDB structures of N10 protein, N11 protein, an influenza A neuraminidase, an influenza B neuraminidase, and a bacterial neuraminidase. By superposing these spatially conserved atoms, we aligned the structures and associated molecules. We report spatially and sequence invariant residues in the aligned structures. Spatially invariant residues in the N6 and influenza B neuraminidase active sites were found in previously unidentified spatially equivalent sites in the N10 and N11 proteins. We found the corresponding secondary and tertiary structures of the aligned proteins to be largely identical despite significant sequence divergence. We found structural precedent in known non-neuraminidase structures for residues exhibiting structural and sequence divergence in the aligned structures. In N10 protein, we identified staphylococcal enterotoxin I-like domains. In N11 protein, we identified hepatitis E E2S-like domains, SARS spike protein-like domains, and toxin components shared by alpha-bungarotoxin, staphylococcal enterotoxin I, anthrax lethal factor, clostridium botulinum neurotoxin, and clostridium tetanus toxin. The presence of active site components common to the N6, influenza B, and S. pneumoniae neuraminidases in the N10 and N11 proteins, combined with the absence of apparent neuraminidase function, suggests that the role of neuraminidases in H17N10 and H18N11 emerging influenza A viruses may have changed. The presentation of E2S-like, SARS spike protein-like, or toxin-like domains by the N10 and N11 proteins in these emerging viruses may indicate that H17N10 and H18N11 sialidase-facilitated cell entry has been supplemented or replaced by sialidase-independent receptor binding to an expanded cell population that may include neurons and T-cells.  相似文献   

11.
The phloem transport system is a complex tissue that primarily carries photoassimilate from source to sink. Its function depends on anucleate sieve elements (SE) supported by companion cells (CC). In this study, SE sap was sampled and the protein identity of soluble proteins was determined with the aim of understanding the function of proteins within the conduit. Unlike many plants, SE sap exudes from incisions in the bark of Ricinus communis and, although there is a greater possibility of contamination from tissues other than SE, sap can be obtained in sufficient quantities to separate proteins using 2D electrophoresis. Spots were excised for trypsin digest, then analysed by quadrupole time of flight (Q-TOF) mass spectrometry (MS) and database searched to determine sequence identity. Overall, 18 proteins were identified in the SE-enriched sap. Proteins identified that have not previously been identified directly from SE sap included a glycine-rich RNA-binding protein, metallothionein, phosphoglycerate mutase, and phosphopyruvate hydratase. The potential role of the identified protein in SE function is discussed. The protein identification in this study provides a first step towards the goal of a greater understanding of the function of proteins within the SE.  相似文献   

12.
An automatic procedure is proposed to identify, from the protein sequence database, conserved amino acid patterns (or sequence motifs) that are exclusive to a group of functionally related proteins. This procedure is applied to the PIR database and a dictionary of sequence motifs that relate to specific superfamilies constructed. The motifs have a practical relevance in identifying the membership of specific superfamilies without the need to perform sequence database searches in 20% of newly determined sequences. The sequence motifs identified represent functionally important sites on protein molecules. When multiple blocks exist in a single motif they are often close together in the 3-D structure. Furthermore, occasionally these motif blocks were found to be split by introns when the correlation with exon structures was examined.  相似文献   

13.
Most homologous pairs of proteins have no significant sequence similarity to each other and are not identified by direct sequence comparison or profile-based strategies. However, multiple sequence alignments of low similarity homologues typically reveal a limited number of positions that are well conserved despite diversity of function. It may be inferred that conservation at most of these positions is the result of the importance of the contribution of these amino acids to the folding and stability of the protein. As such, these amino acids and their relative positions may define a structural signature. We demonstrate that extraction of this fold template provides the basis for the sequence database to be searched for patterns consistent with the fold, enabling identification of homologs that are not recognized by global sequence analysis. The fold template method was developed to address the need for a tool that could comprehensively search the midnight and twilight zones of protein sequence similarity without reliance on global statistical significance. Manual implementations of the fold template method were performed on three folds--immunoglobulin, c-lectin and TIM barrel. Following proof of concept of the template method, an automated version of the approach was developed. This automated fold template method was used to develop fold templates for 10 of the more populated folds in the SCOP database. The fold template method developed three-dimensional structural motifs or signatures that were able to return a diverse collection of proteins, while maintaining a low false positive rate. Although the results of the manual fold template method were more comprehensive than the automated fold template method, the diversity of the results from the automated fold template method surpassed those of current methods that rely on statistical significance to infer evolutionary relationships among divergent proteins.  相似文献   

14.
Genomics has posed the challenge of determination of protein function from sequence and/or 3-D structure. Functional assignment from sequence relationships can be misleading, and structural similarity does not necessarily imply functional similarity. Proteins in the DJ-1 family, many of which are of unknown function, are examples of proteins with both sequence and fold similarity that span multiple functional classes. THEMATICS (theoretical microscopic titration curves), an electrostatics-based computational approach to functional site prediction, is used to sort proteins in the DJ-1 family into different functional classes. Active site residues are predicted for the eight distinct DJ-1 proteins with available 3-D structures. Placement of the predicted residues onto a structural alignment for six of these proteins reveals three distinct types of active sites. Each type overlaps only partially with the others, with only one residue in common across all six sets of predicted residues. Human DJ-1 and YajL from Escherichia coli have very similar predicted active sites and belong to the same probable functional group. Protease I, a known cysteine protease from Pyrococcus horikoshii, and PfpI/YhbO from E. coli, a hypothetical protein of unknown function, belong to a separate class. THEMATICS predicts a set of residues that is typical of a cysteine protease for Protease I; the prediction for PfpI/YhbO bears some similarity. YDR533Cp from Saccharomyces cerevisiae, of unknown function, and the known chaperone Hsp31 from E. coli constitute a third group with nearly identical predicted active sites. While the first four proteins have predicted active sites at dimer interfaces, YDR533Cp and Hsp31 both have predicted sites contained within each subunit. Although YDR533Cp and Hsp31 form different dimers with different orientations between the subunits, the predicted active sites are superimposable within the monomer structures. Thus, the three predicted functional classes form four different types of quaternary structures. The computational prediction of the functional sites for protein structures of unknown function provides valuable clues for functional classification.  相似文献   

15.
A database was established from human hemofiltrate (HF) that consisted of a mass database and a sequence database, with the aim of analyzing the composition of the peptide fraction in human blood. To establish a mass database, all 480 fractions of a peptide bank generated from HF were analyzed by MALDI-TOF mass spectrometry. Using this method, over 20 000 molecular masses representing native, circulating peptides were detected. Estimation of repeatedly detected masses suggests that approximately 5000 different peptides were recorded. More than 95% of the detected masses are smaller than 15 000, indicating that HF predominantly contains peptides. The sequence database contains over 340 entries from 75 different protein and peptide precursors. 55% of the entries are fragments from plasma proteins (fibrinogen A 13%, albumin 10%, β2-microglobulin 8.5%, cystatin C 7%, and fibrinogen B 6%). Seven percent of the entries represent peptide hormones, growth factors and cytokines. Thirty-three percent belong to protein families such as complement factors, enzymes, enzyme inhibitors and transport proteins. Five percent represent novel peptides of which some show homology to known peptide and protein families. The coexistence of processed peptide fragments, biologically active peptides and peptide precursors suggests that HF reflects the peptide composition of plasma. Interestingly, protein modules such as EGF domains (meprin Aα-fragments), somatomedin-B domains (vitronectin fragments), thyroglobulin domains (insulin like growth factor-binding proteins), and Kazal-type inhibitor domains were identified. Alignment of sequenced fragments to their precursor proteins and the analysis of their cleavage sites revealed that there are different processing pathways of plasma proteins in vivo.  相似文献   

16.
PROMISE: a database of bioinorganic motifs.   总被引:1,自引:1,他引:0       下载免费PDF全文
The PROMISE (prosthetic centres andmetalions in protein activesites) database aims to present comprehensive sequence, structural, functional and bibliographic information on metalloproteins and other complex proteins, with an emphasis on active site structure and function. The database is available on the WorldWide Web at http://bioinf.leeds.ac.uk/promise/  相似文献   

17.
At synapses, the release of neurotransmitter is regulated by molecular machinery that aggregates at specialized presynaptic release sites termed active zones. The complement of active zone proteins at each site is a determinant of release efficacy and can be remodeled to alter synapse function. The small GTPase Rab3 was previously identified as playing a novel role that controls the distribution of active zone proteins to individual release sites at the Drosophila neuromuscular junction. Rab3 has been extensively studied for its role in the synaptic vesicle cycle; however, the mechanism by which Rab3 controls active zone development remains unknown. To explore this mechanism, we conducted a mutational analysis to determine the molecular and structural requirements of Rab3 function at Drosophila synapses. We find that GTP-binding is required for Rab3 to traffick to synapses and distribute active zone components across release sites. Conversely, the hydrolytic activity of Rab3 is unnecessary for this function. Through a structure-function analysis we identify specific residues within the effector-binding switch regions that are required for Rab3 function and determine that membrane attachment is essential. Our findings suggest that Rab3 controls the distribution of active zone components via a vesicle docking mechanism that is consistent with standard Rab protein function.  相似文献   

18.
Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify “inter-paralog inversions”, i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline.  相似文献   

19.
Proteins are typically represented by discrete atomic coordinates providing an accessible framework to describe different conformations. However, in some fields proteins are more accurately represented as near-continuous surfaces, as these are imprinted with geometric (shape) and chemical (electrostatics) features of the underlying protein structure. Protein surfaces are dependent on their chemical composition and, ultimately determine protein function, acting as the interface that engages in interactions with other molecules. In the past, such representations were utilized to compare protein structures on global and local scales and have shed light on functional properties of proteins. Here we describe RosettaSurf, a surface-centric computational design protocol, that focuses on the molecular surface shape and electrostatic properties as means for protein engineering, offering a unique approach for the design of proteins and their functions. The RosettaSurf protocol combines the explicit optimization of molecular surface features with a global scoring function during the sequence design process, diverging from the typical design approaches that rely solely on an energy scoring function. With this computational approach, we attempt to address a fundamental problem in protein design related to the design of functional sites in proteins, even when structurally similar templates are absent in the characterized structural repertoire. Surface-centric design exploits the premise that molecular surfaces are, to a certain extent, independent of the underlying sequence and backbone configuration, meaning that different sequences in different proteins may present similar surfaces. We benchmarked RosettaSurf on various sequence recovery datasets and showcased its design capabilities by generating epitope mimics that were biochemically validated. Overall, our results indicate that the explicit optimization of surface features may lead to new routes for the design of functional proteins.  相似文献   

20.
Goyal K  Mande SC 《Proteins》2008,70(4):1206-1218
High throughput structural genomics efforts have been making the structures of proteins available even before their function has been fully characterized. Therefore, methods that exploit the structural knowledge to provide evidence about the functions of proteins would be useful. Such methods would be needed to complement the sequence-based function annotation approaches. The current study describes generation of 3D-structural motifs for metal-binding sites from the known metalloproteins. It then scans all the available protein structures in the PDB database for putative metal-binding sites. Our analysis predicted more than 1000 novel metal-binding sites in proteins using three-residue templates, and more than 150 novel metal-binding sites using four-residue templates. Prediction of metal-binding site in a yeast protein YDR533c led to the hypothesis that it might function as metal-dependent amidopeptidase. The structural motifs identified by our method present novel metal-binding sites that reveal newer mechanisms for a few well-known proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号