首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We have evaluated the effect of lysine guanidination in peptides and proteins on the dissociation of protonated ions in the gas phase. The dissociation of guanidinated model peptide ions compared to their unmodified forms showed behavior consistent with concepts of proton mobility as a major factor in determining favored fragmentation channels. Reduction of proton mobility associated with lysine guanidination was reflected by a relative increase in cleavages occurring C-terminal to aspartic acid residues as well as increases in small molecule losses. To evaluate the effect of guanidination on the dissociation behavior of whole protein ions, bovine ubiquitin was selected as a model. Essentially, all of the amide bond cleavages associated with the +10 charge state of fully guanidinated ubiquitin were observed to occur C-terminal to aspartic acid residues, unlike the dissociation behavior of the +10 ion of the unmodified protein, where competing cleavage N-terminal to proline and nonspecific amide bond cleavages were also observed. The +8 and lower charge states of the guanidinated protein showed prominent losses of small neutral molecules. This overall fragmentation behavior is consistent with current hypotheses regarding whole protein dissociation that consider proton mobility and intramolecular charge solvation as important factors in determining favored dissociation channels, and are also consistent with the fragmentation behaviors observed for the guanidinated model peptide ions. Further evaluation of the utility of condensed phase guanidination of whole proteins is necessary but the results described here confirm that guanidination can be an effective strategy for enhancing C-terminal aspartic acid cleavages. Gas phase dissociation exclusively at aspartic acid residues, especially for whole protein ions, could be useful in identifying and characterizing proteins via tandem mass spectrometry of whole protein ions.  相似文献   

2.
Identification of proteins from the mass spectra of peptide fragments generated by proteolytic cleavage using database searching has become one of the most powerful techniques in proteome science, capable of rapid and efficient protein identification. Using computer simulation, we have studied how the application of chemical derivatisation techniques may improve the efficiency of protein identification from mass spectrometric data. These approaches enhance ion yield and lead to the promotion of specific ions and fragments, yielding additional database search information. The impact of three alternative techniques has been assessed by searching representative proteome databases for both single proteins and simple protein mixtures. For example, by reliably promoting fragmentation of singly-charged peptide ions at aspartic acid residues after homoarginine derivatisation, 82% of yeast proteins can be unambiguously identified from a single typical peptide-mass datum, with a measured mass accuracy of 50 ppm, by using the associated secondary ion data. The extra search information also provides a means to confidently identify proteins in protein mixtures where only limited data are available. Furthermore, the inclusion of limited sequence information for the peptides can compensate and exceed the search efficiency available via high accuracy searches of around 5 ppm, suggesting that this is a potentially useful approach for simple protein mixtures routinely obtained from two-dimensional gels.  相似文献   

3.
One of the challenges associated with large-scale proteome analysis using tandem mass spectrometry (MS/MS) and automated database searching is to reduce the number of false positive identifications without sacrificing the number of true positives found. In this work, a systematic investigation of the effect of 2MEGA labeling (N-terminal dimethylation after lysine guanidination) on the proteome analysis of a membrane fraction of an Escherichia coli cell extract by 2-dimensional liquid chromatography MS/MS is presented. By a large-scale comparison of MS/MS spectra of native peptides with those from the 2MEGA-labeled peptides, the labeled peptides were found to undergo facile fragmentation with enhanced a1 or a1-related (a(1)-17 and a(1)-45) ions derived from all N-terminal amino acids in the MS/MS spectra; these ions are usually difficult to detect in the MS/MS spectra of nonderivatized peptides. The 2MEGA labeling alleviated the biased detection of arginine-terminated peptides that is often observed in MALDI and ESI MS experiments. 2MEGA labeling was found not only to increase the number of peptides and proteins identified but also to generate enhanced a1 or a1-related ions as a constraint to reduce the number of false positive identifications. In total, 640 proteins were identified from the E. coli membrane fraction, with each protein identified based on peptide mass and sequence match of one or more peptides using MASCOT database search algorithm from the MS/MS spectra generated by a quadrupole time-of-flight mass spectrometer. Among them, the subcellular locations of 336 proteins are presently known, including 258 membrane and membrane-associated proteins (76.8%). Among the classified proteins, there was a dramatic increase in the total number of integral membrane proteins identified in the 2MEGA-labeled sample (153 proteins) versus the unlabeled sample (77 proteins).  相似文献   

4.
We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE (), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics.  相似文献   

5.
We have created a database of two-domain proteins with homology less than 25% (452 proteins). Based on one half of this set of proteins statistics of appearance of amino acid residues on the domain boundaries of multiple domain proteins has been obtained. Small and hydrophilic amino acids (proline, glycine, asparagine, glutamic acid, arginine and others) appear on the domain boundaries more often than in the whole protein. Opposite, hydrophobic amino acid residues (tryptophane, methionine, phenylalanine and others) appear on the domain boundaries more rarely. The obtained scales of the appearance of amino acid residues on the boundary regions from the statistics have been used for calculation of domain boundaries in the proteins of the second half of the database. The probability scale obtained by averaging the appearance of amino acid residues on the domain boundary region including 8 residues (+/-4 residues from the real domain boundary) gives the best result: for 57% of proteins the predicted boundary was closer than 40 residues to the boundary assigned from three-dimensional structures, for 41% it was closer than 20 residues from the real boundary. The probability scale was used to predict domain boundaries for proteins with unknown three-dimensional structure (international competition CASP6).  相似文献   

6.
A novel database search algorithm is presented for the qualitative identification of proteins over a wide dynamic range, both in simple and complex biological samples. The algorithm has been designed for the analysis of data originating from data independent acquisitions, whereby multiple precursor ions are fragmented simultaneously. Measurements used by the algorithm include retention time, ion intensities, charge state, and accurate masses on both precursor and product ions from LC‐MS data. The search algorithm uses an iterative process whereby each iteration incrementally increases the selectivity, specificity, and sensitivity of the overall strategy. Increased specificity is obtained by utilizing a subset database search approach, whereby for each subsequent stage of the search, only those peptides from securely identified proteins are queried. Tentative peptide and protein identifications are ranked and scored by their relative correlation to a number of models of known and empirically derived physicochemical attributes of proteins and peptides. In addition, the algorithm utilizes decoy database techniques for automatically determining the false positive identification rates. The search algorithm has been tested by comparing the search results from a four‐protein mixture, the same four‐protein mixture spiked into a complex biological background, and a variety of other “system” type protein digest mixtures. The method was validated independently by data dependent methods, while concurrently relying on replication and selectivity. Comparisons were also performed with other commercially and publicly available peptide fragmentation search algorithms. The presented results demonstrate the ability to correctly identify peptides and proteins from data independent acquisition strategies with high sensitivity and specificity. They also illustrate a more comprehensive analysis of the samples studied; providing approximately 20% more protein identifications, compared to a more conventional data directed approach using the same identification criteria, with a concurrent increase in both sequence coverage and the number of modified peptides.  相似文献   

7.
Repetitiousness is often observed in the primary and tertiary structures of proteins. We are intrigued by the potential role played by periodicity in the evolution of proteins and have created artificial repetitious proteins from repeats of short DNA sequences (microgenes). In this paper we characterize the physicochemical properties of six such artificially created proteins, which are the translated products of repeats of three microgenes. Three of the six proteins contain beta-sheet-like structures and are rather hydrophobic in nature. These proteins form macroscopic membranous structures in the presence of monovalent cationic ions, suggesting they have the capacity to promote strong intermolecular interactions. Of the other three proteins, one is comprised of alpha-helices and two have disordered structures. Small angle X-ray scattering analysis indicates that the artificial proteins do not fold as tightly as natural proteins, but are more compact than if completely denatured. One alpha-helical protein whose microgene unit was designed from coiled coil proteins was crystallized, demonstrating that repetitious artificial proteins can undergo transition to a more ordered state under appropriate conditions. Application of this approach to the development of a novel protein engineering system is discussed.  相似文献   

8.
Identifying common local segments, also called motifs, in multiple protein sequences plays an important role for establishing homology between proteins. Homology is easy to establish when sequences are similar (sharing an identity > 25%). However, for distant proteins, it is much more difficult to align motifs that are not similar in sequences but still share common structures or functions. This paper is a first attempt to align multiple protein sequences using both primary and secondary structure information. A new sequence model is proposed so that the model assigns high probabilities not only to motifs that contain conserved amino acids but also to motifs that present common secondary structures. The proposed method is tested in a structural alignment database BAliBASE. We show that information brought by the predicted secondary structures greatly improves motif identification. A website of this program is available at www.stat.purdue.edu/~junxie/2ndmodel/sov.html.  相似文献   

9.
Human articular-cartilage link proteins are resolved into three components by sodium dodecyl sulphate/polyacrylamide-gel electrophoresis, indicative of three different structures. The action of the proteinase clostripain yields a single link-protein component with electrophoretic properties analogous to those of the smallest (most mobile) native link protein, suggesting that this link protein may be derived naturally from one or both of the larger molecules by proteolytic cleavage in situ. Upon chemical deglycosylation of native link protein two components are resolved, suggesting that two of the link proteins differ only in their degree and/or type of oligosaccharide substitution. This pattern is compatible with a proteolytic origin for the smallest link protein. During aging further proteolytic fragmentation occurs, though it is only apparent on reduction of disulphide bonds. This fragmentation occurs at identical sites in all three native link proteins, indicating the existence of a large region common to all the link proteins, which appears to consist predominantly of the C-terminal half of the molecules. These observations are compatible with the variation in oligosaccharide and proteolytic heterogeneity occurring at the N-terminus of the link proteins.  相似文献   

10.
《Journal of molecular biology》2019,431(13):2460-2466
PhyreRisk is an open-access, publicly accessible web application for interactively bridging genomic, proteomic and structural data facilitating the mapping of human variants onto protein structures. A major advance over other tools for sequence-structure variant mapping is that PhyreRisk provides information on 20,214 human canonical proteins and an additional 22,271 alternative protein sequences (isoforms). Specifically, PhyreRisk provides structural coverage (partial or complete) for 70% (14,035 of 20,214 canonical proteins) of the human proteome, by storing 18,874 experimental structures and 84,818 pre-built models of canonical proteins and their isoforms generated using our in house Phyre2. PhyreRisk reports 55,732 experimentally, multi-validated protein interactions from IntAct and 24,260 experimental structures of protein complexes.Another major feature of PhyreRisk is that, rather than presenting a limited set of precomputed variant-structure mapping of known genetic variants, it allows the user to explore novel variants using, as input, genomic coordinates formats (Ensembl, VCF, reference SNP ID and HGVS notations) and Human Build GRCh37 and GRCh38. PhyreRisk also supports mapping variants using amino acid coordinates and searching for genes or proteins of interest.PhyreRisk is designed to empower researchers to translate genetic data into protein structural information, thereby providing a more comprehensive appreciation of the functional impact of variants. PhyreRisk is freely available at http://phyrerisk.bc.ic.ac.uk  相似文献   

11.
The development of tools for the analysis of global gene expression is vital for the optimal exploitation of the data on parasite genomes that are now being generated in abundance. Recent advances in two-dimensional electrophoresis (2-DE), mass spectrometry and bioinformatics have greatly enhanced the possibilities for mapping and characterisation of protein populations. We have employed these developments in a proteomics approach for the analysis of proteins expressed in the tachyzoite stage of Toxoplasma gondii. Over 1000 polypeptides were reproducibly separated by high-resolution 2-DE using the pH ranges 4-7 and 6-11. Further separations using narrow range gels suggest that at least 3000-4000 polypeptides should be resolvable by 2-DE using multiple single pH unit gels. Mass spectrometry was used to characterise a variety of protein spots on the 2-DE gels. Peptide mass fingerprints, acquired by matrix-assisted laser desorption/ionisation-(MALDI) mass spectrometry, enabled unambiguous protein identifications to be made where full gene sequence information was available. However, interpretation of peptide mass fingerprint data using the T. gondii expressed sequence tag (EST) database was less reliable. Peptide fragmentation data, acquired by post-source decay mass spectrometry, proved a more successful strategy for the putative identification of proteins using the T. gondii EST database and protein databases from other organisms. In some instances, several protein spots appeared to be encoded by the same gene, indicating that post-translational modification and/or alternative splicing events may be a common feature of functional gene expression in T. gondii. The data demonstrate that proteomic analyses are now viable for T. gondii and other protozoa for which there are good EST databases, even in the absence of complete genome sequence. Moreover, proteomics is of great value in interpreting and annotating EST databases.  相似文献   

12.
The advent of whole genome sequencing leads to increasing number of proteins with known amino acid sequences. Despite many efforts, the number of proteins with resolved three dimensional structures is still low. One of the challenging tasks the structural biologists face is the prediction of the interaction of metal ion with any protein for which the structure is unknown. Based on the information available in Protein Data Bank, a site (METALACTIVE INTERACTION) has been generated which displays information for significant high preferential and low‐preferential combination of endogenous ligands for 49 metal ions. User can also gain information about the residues present in the first and second coordination sphere as it plays a major role in maintaining the structure and function of metalloproteins in biological system. In this paper, a novel computational tool (ZINCCLUSTER) is developed, which can predict the zinc metal binding sites of proteins even if only the primary sequence is known. The purpose of this tool is to predict the active site cluster of an uncharacterized protein based on its primary sequence or a 3D structure. The tool can predict amino acids interacting with a metal or vice versa. This tool is based on the occurrence of significant triplets and it is tested to have higher prediction accuracy when compared to that of other available techniques.  相似文献   

13.
Torshin IY  Harrison RW 《Proteins》2001,43(4):353-364
Electrostatic interactions are important for protein folding. At low resolution, the electrostatic field of the whole molecule can be described in terms of charge center(s). To study electrostatic effects, the centers of positive and negative charge were calculated for 20 small proteins of known structure, for which hydrogen exchange cores had been determined experimentally. Two observations seem to be important. First, in all 20 proteins studied 30-100% of the residues forming hydrogen exchange core(s) were clustered around the charge centers. Moreover, in each protein more than half of the core sequences are located near the centers of charge. Second, the general architecture of all-alpha proteins from the set seems to be stabilized by interactions of residues surrounding the charge centers. In most of the alpha-beta proteins, either or both of the centers are located near a pair of consecutive strands, and this is even more characteristic for alpha/Beta and all-beta structures. Consecutive strands are very probable sites of early folding events. These two points lead to the conclusion that charge centers, defined solely from the structure of the folded protein may indicate the location of a protein's hydrogen exchange/folding core. In addition, almost all the proteins contain well-conserved continuous hydrophobic sequences of three or more residues located in the vicinity of the charge centers. These hydrophobic sequences may be primary nucleation sites for protein folding. The results suggest the following scheme for the order of events in folding: local hydrophobic nucleation, electrostatic collapse of the core, global hydrophobic collapse, and slow annealing to the native state. This analysis emphasizes the importance of treating electrostatics during protein-folding simulations.  相似文献   

14.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

15.
Phosphate is one of the most frequently exploited chemical moieties in nature, present in a wide range of naturally occurring and critically important small molecules. Several phosphate group recognition motifs have been found for a few narrow groups of proteins, but for many protein families and folds the mode of phosphate recognition remains unclear. Here, we have analyzed the structures of all fold-representative protein-ligand complexes listed in the FSSP database, regardless of whether the bound ligand included a phosphate group. Based on a phosphate-binding motif that we identified in pyridoxal phosphate binding proteins, we have identified a new anion-binding structural motif, CalphaNN, common to 104 fold-representative protein structures that belong to 62 different folds, of which 86% of the fold-representative structures (51 folds) bind phosphate or lone sulfate ions. This motif leads to a precise mode for phosphate group recognition forming a structure where atoms of the phosphate group occupy the most favorable stabilizing positions. The anion-binding CalphaNN motif is based only on main-chain atoms from three adjacent residues, has a conservative betaalphaalpha or betaalphabeta geometry, and recognizes the free phosphate (sulfate) ion as well as one or more phosphate groups in nucleotides and in a variety of cofactors. Moreover, the CalphaNN motif is positioned in functionally important regions of protein structures and often residues of the motif directly participate in the function of the protein.  相似文献   

16.
To analyze the interrelationships between the amino acid sequences of the proteins of hepatitis C virus and the functional characteristics of different variants of this virus, a database of protein functional mapping of hepatitis C virus was developed. The database contains amino acid sequences (both full-size and fragmentary) retrieved from accessible databases and experimental data published in literature. The database also contains the results of comparison and treatment of primary data, including alignments and functional regions. On the basis of these data, variable and conservative regions of envelope proteins of hepatitis C virus were revealed. Antigenic and functional maps of structural and nonstructural proteins of the virus were constructed. The most variable region of the envelope protein E2 (HVR1) was analysed. It is assumed that the conservatism of some amino acid positions of HVR1 is related to the functions of this region.  相似文献   

17.
Disulphide bonds in proteins are known to play diverse roles ranging from folding to structure to function. Thorough knowledge of the conservation status and structural state of the disulphide bonds will help in understanding of the differences in homologous proteins. Here we present a database for the analysis of conservation and conformation of disulphide bonds in SCOP structural families. This database has a wide range of applications including mapping of disulphide bond mutation patterns, identification of disulphide bonds important for folding and stabilization, modeling of protein tertiary structures and in protein engineering. The database can be accessed at: http://bioinformatics.univ-reunion.fr/analycys/.  相似文献   

18.
Proteins of the nucleic acid‐binding proteins superfamily perform such functions as processing, transport, storage, stretching, translation, and degradation of RNA. It is one of the 16 superfamilies containing the OB‐fold in protein structures. Here, we have analyzed the superfamily of nucleic acid‐binding proteins (the number of sequences exceeds 200,000) and obtained that this superfamily prevalently consists of proteins containing the cold shock DNA‐binding domain (ca. 131,000 protein sequences). Proteins containing the S1 domain compose 57% from the cold shock DNA‐binding domain family. Furthermore, we have found that the S1 domain was identified mainly in the bacterial proteins (ca. 83%) compared to the eukaryotic and archaeal proteins, which are available in the UniProt database. We have found that the number of multiple repeats of S1 domain in the S1 domain‐containing proteins depends on the taxonomic affiliation. All archaeal proteins contain one copy of the S1 domain, while the number of repeats in the eukaryotic proteins varies between 1 and 15 and correlates with the protein size. In the bacterial proteins, the number of repeats is no more than 6, regardless of the protein size. The large variation of the repeat number of S1 domain as one of the structural variants of the OB‐fold is a distinctive feature of S1 domain‐containing proteins. Proteins from the other families and superfamilies have either one OB‐fold or change slightly the repeat numbers. On the whole, it can be supposed that the repeat number is a vital for multifunctional activity of the S1 domain‐containing proteins. Proteins 2017; 85:602–613. © 2016 Wiley Periodicals, Inc.  相似文献   

19.
A fragmentation geometry based upon axial acceleration of m/z-selected protein ions into a linear octopole ion trap allowed simultaneous production and external accumulation of fragment ions prior to m/z measurement in a FT mass spectrometer. Improved dynamic range resulting from this octopole collisionally activated dissociation resulted in a 2.5x increase in experimental throughput and a 2x increase in fragment ion matches to gene products identified and characterized in the top down fashion. The acceleration voltage for optimal fragmentation has a m/z and mass dependence, knowledge of which facilitated an automated platform for top down MS/MS on a quadrupole FT hybrid mass spectrometer. Controlled by improved software for data acquisition (e.g. using dynamic exclusion of previously identified species), automated octopole collisionally activated dissociation of samples fractionated using chromatofocusing and reversed-phase liquid chromatography achieved a significant increase in protein identification rate versus previous benchmarks. Also a batch analysis version of ProSight PTM facilitated probability-based identification of intact proteins obtained in a higher throughput fashion. In total, 101 unique proteins (5-59 kDa) were identified from whole cell lysates of Methanosarcina acetivorans grown anaerobically, including the characterization of several mispredicted start sites and biologically relevant mass discrepancies.  相似文献   

20.
Newly determined protein structures are classified to belong to a new fold, if the structures are sufficiently dissimilar from all other so far known protein structures. To analyze structural similarities of proteins, structure alignment tools are used. We demonstrate that the usage of nonsequential structure alignment tools, which neglect the polypeptide chain connectivity, can yield structure alignments with significant similarities between proteins of known three-dimensional structure and newly determined protein structures that possess a new fold. The recently introduced protein structure alignment tool, GANGSTA, is specialized to perform nonsequential alignments with proper assignment of the secondary structure types by focusing on helices and strands only. In the new version, GANGSTA+, the underlying algorithms were completely redesigned, yielding enhanced quality of structure alignments, offering alignment against a larger database of protein structures, and being more efficient. We applied DaliLite, TM-align, and GANGSTA+ on three protein crystal structures considered to be novel folds. Applying GANGSTA+ to these novel folds, we find proteins in the ASTRAL40 database, which possess significant structural similarities, albeit the alignments are nonsequential and in some cases involve secondary structure elements aligned in reverse orientation. A web server is available at http://agknapp.chemie.fu-berlin.de/gplus for pairwise alignment, visualization, and database comparison.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号