首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Tandem mass spectrometry combined with sequence database searching is one of the most powerful tools for protein identification. As thousands of spectra are generated by a mass spectrometer in one hour, the speed of database searching is critical, especially when searching against a large sequence database, or when the peptide is generated by some unknown or non-specific enzyme, even or when the target peptides have post-translational modifications (PTM). In practice, about 70-90% of the spectra have no match in the database. Many believe that a significant portion of them are due to peptides of non-specific digestions by unknown enzymes or amino acid modifications. In another case, scientists may choose to use some non-specific enzymes such as pepsin or thermolysin for proteolysis in proteomic study, in that not all proteins are amenable to be digested by some site-specific enzymes, and furthermore many digested peptides may not fall within the rang of molecular weight suitable for mass spectrometry analysis. Interpreting mass spectra of these kinds will cost a lot of computational time of database search engines. OVERVIEW: The present study was designed to speed up the database searching process for both cases. More specifically speaking, we employed an approach combining suffix tree data structure and spectrum graph. The suffix tree is used to preprocess the protein sequence database, while the spectrum graph is used to preprocess the tandem mass spectrum. We then search the suffix tree against the spectrum graph for candidate peptides. We design an efficient algorithm to compute a matching threshold with some statistical significance level, e.g. p = 0.01, for each spectrum, and use it to select candidate peptides. Then we rank these peptides using a SEQUEST-like scoring function. The algorithms were implemented and tested on experimental data. For post-translational modifications, we allow arbitrary number of any modification to a protein. AVAILABILITY: The executable program and other supplementary materials are available online at: http://hto-c.usc.edu:8000/msms/suffix/.  相似文献   

2.
A new database, SwePep, specifically designed for endogenous peptides, has been constructed to significantly speed up the identification process from complex tissue samples utilizing mass spectrometry. In the identification process the experimental peptide masses are compared with the peptide masses stored in the database both with and without possible post-translational modifications. This intermediate identification step is fast and singles out peptides that are potential endogenous peptides and can later be confirmed with tandem mass spectrometry data. Successful applications of this methodology are presented. The SwePep database is a relational database developed using MySql and Java. The database contains 4180 annotated endogenous peptides from different tissues originating from 394 different species as well as 50 novel peptides from brain tissue identified in our laboratory. Information about the peptides, including mass, isoelectric point, sequence, and precursor protein, is also stored in the database. This new approach holds great potential for removing the bottleneck that occurs during the identification process in the field of peptidomics. The SwePep database is available to the public.  相似文献   

3.
Arg-Gly-Asp (RGD) peptides contain an aspartic acid residue that is highly susceptible to chemical degradation and leads to the loss of biological activity. Our hypothesis is that cyclization of RGD peptides via disulphide bond linkage can induce structural rigidity, thereby preventing degradation mediated by the aspartic acid residue. In this paper, we compared the solution stability of a linear peptide (Arg-Gly-Asp-Phe-OH; 1) and a cyclic peptide (cyclo-(1, 6)-Ac-Cys-Arg-Gly-Asp-Phe-Pen-NH2; 2) as a function of pH and buffer concentration. The decomposition of both peptides was studied in buffers ranging from pH 2-12 at 50 degrees C. Reversed-phase HPLC was used as the main tool in determining the degradation rates and pathways of both peptides. Fast atom bombardment mass spectrometry (FAB-MS), electrospray ionization mass spectrometry (ESI-MS), matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) mass spectrometry, liquid chromatography-mass spectrometry (LC-MS), and one- and two-dimensional nuclear magnetic resonance spectroscopy (NMR) were used to characterize peptides 1 and 2 and their degradation products. In addition, co-elution with authentic samples was used to identify degradation products. Both peptides displayed pseudo-first-order kinetics at all pH values studied. The cyclic peptide 2 appeared to be 30-fold more stable than the linear peptide 1 at pH 7. The degradation mechanisms of linear (1) and cyclic (2) peptides primarily involved the aspartic acid residue. However, above pH 8 the stability of the cyclic peptide decreased dramatically due to disulphide bond degradation. Both peptides also exhibited a change in degradation mechanism upon an increase in pH. The increase in stability of cyclic peptide 2 compared to linear peptide 1, especially at neutral pH, may be due to decreased structural flexibility imposed by the ring. This rigidity would prevent the Asp side chain carboxylic acid from orientating itself in the appropriate position for attack on the peptide backbone.  相似文献   

4.
Disulphide bonds in proteins are known to play diverse roles ranging from folding to structure to function. Thorough knowledge of the conservation status and structural state of the disulphide bonds will help in understanding of the differences in homologous proteins. Here we present a database for the analysis of conservation and conformation of disulphide bonds in SCOP structural families. This database has a wide range of applications including mapping of disulphide bond mutation patterns, identification of disulphide bonds important for folding and stabilization, modeling of protein tertiary structures and in protein engineering. The database can be accessed at: http://bioinformatics.univ-reunion.fr/analycys/.  相似文献   

5.
When analyzing proteins in complex samples using tandem mass spectrometry of peptides generated by proteolysis, the inference of proteins can be ambiguous, even with well-validated peptides. Unresolved questions include whether to show all possible proteins vs a minimal list, what to do when proteins are inferred ambiguously, and how to quantify peptides that bridge multiple proteins, each with distinguishing evidence. Here we describe IsoformResolver, a peptide-centric protein inference algorithm that clusters proteins in two ways, one based on peptides experimentally identified from MS/MS spectra, and the other based on peptides derived from an in silico digest of the protein database. MS/MS-derived protein groups report minimal list proteins in the context of all possible proteins, without redundantly listing peptides. In silico-derived protein groups pull together functionally related proteins, providing stable identifiers. The peptide-centric grouping strategy used by IsoformResolver allows proteins to be displayed together when they share peptides in common, providing a comprehensive yet concise way to organize protein profiles. It also summarizes information on spectral counts and is especially useful for comparing results from multiple LC-MS/MS experiments. Finally, we examine the relatedness of proteins within IsoformResolver groups and compare its performance to other protein inference software.  相似文献   

6.
Protein identification has been greatly facilitated by database searches against protein sequences derived from product ion spectra of peptides. This approach is primarily based on the use of fragment ion mass information contained in a MS/MS spectrum. Unambiguous protein identification from a spectrum with low sequence coverage or poor spectral quality can be a major challenge. We present a two-dimensional (2D) mass spectrometric method in which the numbers of nitrogen atoms in the molecular ion and the fragment ions are used to provide additional discriminating power for much improved protein identification and de novo peptide sequencing. The nitrogen number is determined by analyzing the mass difference of corresponding peak pairs in overlaid spectra of (15)N-labeled and unlabeled peptides. These peptides are produced by enzymatic or chemical cleavage of proteins from cells grown in (15)N-enriched and normal media, respectively. It is demonstrated that, using 2D information, i.e., m/z and its associated nitrogen number, this method can, not only confirm protein identification results generated by MS/MS database searching, but also identify peptides that are not possible to identify by database searching alone. Examples are presented of analyzing Escherichia coli K12 extracts that yielded relatively poor MS/MS spectra, presumably from the digests of low abundance proteins, which can still give positive protein identification using this method. Additionally, this 2D MS method can facilitate spectral interpretation for de novo peptide sequencing and identification of posttranslational or other chemical modifications. We envision that this method should be particularly useful for proteome expression profiling of organelles or cells that can be grown in (15)N-enriched media.  相似文献   

7.
Completion of the Caenorhabditis elegans genome sequencing project in 1998 has provided more insight into the complexity of nematode neuropeptide signaling. Several C. elegans neuropeptide precursor genes, coding for approximately 250 peptides, have been predicted from the genomic database. One can, however, not deduce whether all these peptides are actually expressed, nor is it possible to predict all post-translational modifications. Using two dimensional nanoscale liquid chromatography combined with tandem mass spectrometry and database mining, we analyzed a mixed stage C. elegans extract. This peptidomic setup yielded 21 peptides derived from formerly predicted neuropeptide-like protein (NLP) precursors and 28 predicted FMRFamide-related peptides. In addition, we were able to sequence 11 entirely novel peptides derived from nine peptide precursors that were not predicted or identified in any way previously. Some of the identified peptides display profound sequence similarities with neuropeptides from other invertebrates, indicating that these peptides have a long evolutionary history.  相似文献   

8.
Informatics for protein identification by mass spectrometry   总被引:3,自引:0,他引:3  
High throughput protein analysis (i.e., proteomics) first became possible when sensitive peptide mass mapping techniques were developed, thereby allowing for the possibility of identifying and cataloging most 2D gel electrophoresis spots. Shortly thereafter a few groups pioneered the idea of identifying proteins by using peptide tandem mass spectra to search protein sequence databases. Hence, it became possible to identify proteins from very complex mixtures. One drawback to these latter techniques is that it is not entirely straightforward to make matches using tandem mass spectra of peptides that are modified or have sequences that differ slightly from what is present in the sequence database that is being searched. This has been part of the motivation behind automated de novo sequencing programs that attempt to derive a peptide sequence regardless of its presence in a sequence database. The sequence candidates thus generated are then subjected to homology-based database search programs (e.g., BLAST or FASTA). These homology search programs, however, were not developed with mass spectrometry in mind, and it became necessary to make minor modifications such that mass spectrometric ambiguities can be taken into account when comparing query and database sequences. Finally, this review will discuss the important issue of validating protein identifications. All of the search programs will produce a top ranked answer; however, only the credulous are willing to accept them carte blanche.  相似文献   

9.
Han X  He L  Xin L  Shan B  Ma B 《Journal of proteome research》2011,10(7):2930-2936
Tandem mass spectrometry (MS/MS) has been routinely used to identify peptides from a protein sequence database. To identify post-translationally modified peptides, most existing software requires the specification of a few possible modifications. However, such knowledge of possible modifications is not always available. In this paper, we describe a new algorithm for identifying modified peptides without requiring the user to specify the possible modifications; instead, all modifications from the Unimod database are considered. Meanwhile, several new techniques are employed to avoid the exponential growth of the search space, as well as to control the false discoveries due to this unrestricted search approach. Finally, a software tool, PeaksPTM, has been developed and already achieved a stronger performance than competitive tools for unrestricted identification of post-translational modifications.  相似文献   

10.
A computer program allowing the correct alignment of peptides generated by a first cleaving agent during protein sequence determination studies has been developed. The program elaborates data obtained from fast atom bombardment mass spectrometric analysis of different digests of the protein. The recorded mass values are used to identify peptides in these digests that overlap peptides from the first cleavage, thus making it possible to establish unambiguously the correct order of these peptides in the protein chain. This procedure has been tested on a model protein by reconstructing the complete sequence of human beta-globin chain, determining the correct alignment of 14 tryptic peptides.  相似文献   

11.
Analysing proteomic data   总被引:5,自引:0,他引:5  
The rapid growth of proteomics has been made possible by the development of reproducible 2D gels and biological mass spectrometry. However, despite technical improvements 2D gels are still less than perfectly reproducible and gels have to be aligned so spots for identical proteins appear in the same place. Gels can be warped by a variety of techniques to make them concordant. When gels are manipulated to improve registration, information is lost, so direct methods for gel registration which make use of all available data for spot matching are preferable to indirect ones. In order to identify proteins from gel spots a property or combination of properties that are unique to that protein are required. These can then be used to search databases for possible matches. Molecular mass, pI, amino acid composition and short sequence tags can all be used in database searches. Currently the method of choice for protein identification is mass spectrometry. Proteins are eluted from the gels and cleaved with specific endoproteases to produce a series of peptides of different molecular mass. In peptide mass fingerprinting, the peptide profile of the unknown protein is compared with theoretical peptide libraries generated from sequences in the different databases. Tandem mass spectroscopy (MS/MS) generates short amino acid sequence tags for the individual peptides. These partial sequences combined with the original peptide masses are then used for database searching, greatly improving specificity. Increasingly protein identification from MS/MS data is being fully or partially automated. When working with organisms, which do not have sequenced genomes (the case with most helminths), protein identification by database searching becomes problematical. A number of approaches to cross species protein identification have been suggested, but if the organism being studied is only distantly related to any organism with a sequenced genome then the likelihood of protein identification remains small. The dynamic nature of the proteome means that there really is no such thing as a single representative proteome and a complete set of metadata (data about the data) is going to be required if the full potential of database mining is to be realised in the future.  相似文献   

12.
A novel hybrid methodology for the automated identification of peptides via de novo integer linear optimization, local database search, and tandem mass spectrometry is presented in this article. A modified version of the de novo identification algorithm PILOT, is utilized to construct accurate de novo peptide sequences. A modified version of the local database search tool FASTA is used to query these de novo predictions against the nonredundant protein database to resolve any low-confidence amino acids in the candidate sequences. The computational burden associated with performing several alignments is alleviated with the use of distributive computing. Extensive computational studies are presented for this new hybrid methodology, as well as comparisons with MASCOT for a set of 38 quadrupole time-of-flight (QTOF) and 380 OrbiTrap tandem mass spectra. The results for our proposed hybrid method for the OrbiTrap spectra are also compared with a modified version of PepNovo, which was trained for use on high-precision tandem mass spectra, and the tag-based method InsPecT. The de novo sequences of PILOT and PepNovo are also searched against the nonredundant protein database using CIDentify to compare with the alignments achieved by our modifications of FASTA. The comparative studies demonstrate the excellent peptide identification accuracy gained from combining the strengths of our de novo method, which is based on integer linear optimization, and database driven search methods.  相似文献   

13.
Mass spectrometry combined with database searching has become the preferred method for identifying proteins in proteomics projects. Proteins are digested by one or several enzymes to obtain peptides, which are analyzed by mass spectrometry. We introduce a new family of scoring schemes, named OLAV, aimed at identifying peptides in a database from their tandem mass spectra. OLAV scoring schemes are based on signal detection theory, and exploit mass spectrometry information more extensively than previously existing schemes. We also introduce a new concept of structural matching that uses pattern detection methods to better separate true from false positives. We show the superiority of OLAV scoring schemes compared to MASCOT, a widely used identification program. We believe that this work introduces a new way of designing scoring schemes that are especially adapted to high-throughput projects such as GeneProt large-scale human plasma project, where it is impractical to check all identifications manually.  相似文献   

14.
SUMMARY: The database of structural motifs in proteins (DSMP) contains data relevant to helices, beta-turns, gamma-turns, beta-hairpins, psi-loops, beta-alpha-beta motifs, beta-sheets, beta-strands and disulphide bridges extracted from all proteins in the Protein Data Bank primarily using the PROMOTIF program and implemented as a web-based network service using the SRS. The data corresponding to the structural motifs includes; sequence, position in polypeptide chain, geometry, type, unique code, keywords and resolution of crystal structure. This data is available for a representative data set of 1028 protein chains and also for all 10 213 proteins in the Protein Data Bank. The three-dimensional coordinates for all structural motifs (except sheet and disulphide bridge) are also available for the representative data set. Using features in SRS, DSMP can be queried to extract information from one or more structural motifs that may be useful for sequence-structure analysis, prediction, modelling or design. AVAILABILITY: http://www. cdfd.org.in/dsmp.html  相似文献   

15.
Peptides play crucial roles in many physiological events. However, a database for endogenous peptides has not yet been developed, because the peptides are easily degraded by proteolytic enzymes during extraction and purification. In this study, we demonstrated that the data for endogenous peptides could be collected by minimizing the proteolytic degradation. We separated porcine brain peptides into 5250 fractions by 2-dimensional chromatography (first ion-exchange and second reversed-phase high-performance liquid chromatography), and 75 fractions of average peptide contents were analyzed in detail by mass spectrometers and a protein sequencer. Based on the analysis data obtained in this study, more than 10000 peptides were deduced to be detected, and more than 1000 peptides to be identified starting from 2 g of brain tissue. Thus, we deduce that it is possible to construct a database for endogenous peptides starting from a gram level of tissue by using 2-dimensional high-performance liquid chromatography coupled with a mass spectrometer.  相似文献   

16.
毛细管区带电泳/串联质谱联用法鉴定多肽和蛋白质   总被引:11,自引:3,他引:8  
建立了毛细管区带电泳-串联质谱联用(CZE/MS/MS)对多肽和蛋白质高灵敏度鉴定方法,对Met-脑啡肽和Leu-脑啡肽的混合物进行了分析,用CZE/MS/MS方法验证了各自的序列,同样对细胞色素c的胰蛋白酶酶解产物用CZE/MS/MS方法进行了肽质谱分析,几科所有肽段的序列及其与在分子中的位置都得到了确定,通过SEQUEST软件进行蛋白质序列数据库搜索得到准确的鉴定结果,所消耗的样品量均在低皮可  相似文献   

17.
Protein Z, a vitamin K-dependent plasma protein, has been detected for the first time in a renal calculus along with osteopontin and prothrombin. The renal calculus was obtained from a hyperuricemic patient. Following two-dimensional polyacrylamide gel electrophoresis, the calculus was analyzed with the use of liquid chromatography mass spectrometry (LC-MS). The spectrometer was equipped with a nanoelectrospray interface and an ion trap. Four peptides were determined from a protein in the calculus through LC-MS/MS analysis. Tandem mass spectrum database matching tools were used to identify the protein as protein Z. Authentic protein Z was also analyzed using the same method, and all four peptides determined in the calculus were similarly identified. Whereas protein Z has been reported to be one of the vitamin K-dependent calcium-binding proteins, its role has not been well established. The fact that protein Z exists in a renal calculus composed of calcium oxalate will be beneficial in any future investigations into its role in the body.  相似文献   

18.
The analysis of disulphide bond containing proteins in the Protein Data Bank (PDB) revealed that out of 27,209 protein structures analyzed, 12,832 proteins contain at least one intra-chain disulphide bond and 811 proteins contain at least one inter-chain disulphide bond. The intra-chain disulphide bond containing proteins can be grouped into 256 categories based on the number of disulphide bonds and the disulphide bond connectivity patterns (DBCPs) that were generated according to the position of half-cystine residues along the protein chain. The PDB entries corresponding to these 256 categories represent 509 unique SCOP superfamilies. A simple web-based computational tool is made freely available at the website http://www.ccmb.res.in/bioinfo/dsbcp that allows flexible queries to be made on the database in order to retrieve useful information on the disulphide bond containing proteins in the PDB. The database is useful to identify the different SCOP superfamilies associated with a particular disulphide bond connectivity pattern or vice versa. It is possible to define a query based either on a single field or a combination of the following fields, i.e., PDB code, protein name, SCOP superfamily name, number of disulphide bonds, disulphide bond connectivity pattern and the number of amino acid residues in a protein chain and retrieve information that match the criterion. Thereby, the database may be useful to select suitable protein structural templates in order to model the more distantly related protein homologs/analogs using the comparative modeling methods.  相似文献   

19.
Trichomonas vaginalis causes trichomoniasis, second most sexually transmitted disease. The genome sequence draft of T. vaginalis was published by The Institute of Genomic Research reveals an abnormally large genome size of 160 Mb. It was speculated that a significant portion of the proteome contains paralogous proteins. The present study was aimed at identification and analysis of the paralogous proteins. The all against all search approach is used to identify the paralogous proteins. The dataset of proteins was retrieved from TIGR and TrichDB FTP server. The BLAST-P program performed all against all database searches against the protein database of Trichomonas vaginalis available at NCBI genome database. In the present study about 50,000 proteins were searched where 2,700 proteins were found to be paralogous under the rigid selection criteria. The Pfam database search has identified significant number of paralogous proteins which were further categorized among different 1496 paralogous protein in pfam families, 1027 paralogous protein contains domain, 60 proteins were having different repeats and 1092 paralogous protein sequences of clans. Such identification and functional annotation of paralogous proteins will also help in removing paralogous proteins from possible drug targets in future. Presence of huge number of paralogous proteins across wide range of gene families and domains may be one of the possible mechanisms involved in the T. vaginalis genome expansion and evolution.  相似文献   

20.
High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号