首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We demonstrate a new approach to the determination of amino acid composition from tandem mass spectrometrically fragmented peptides using both experimental and simulated data. The approach has been developed to be used as a search-space filter in a protein identification pipeline with the aim of increased performance above that which could be attained by using immonium ion information. Three automated methods have been developed and tested: one based upon a simple peak traversal, in which all intense ion peaks are treated as being either a b- or y-ion using a wide mass tolerance; a second which uses a much narrower tolerance and does not perform transformations of ion peaks to the complementary type; and the unique fragments method which allows for b- or y-ion type to be inferred and corroborated using a scan of the other ions present in each peptide spectrum. The combination of these methods is shown to provide a high-accuracy set of amino acid predictions using both experimental and simulated data sets. These high quality predictions, with an accuracy of over 85%, may be used to identify peptide fragments that are hard to identify using other methods. The data simulation algorithm is also shown post priori to be a good model of noiseless tandem mass spectrometric peptide data.  相似文献   

2.
A notable inefficiency of shotgun proteomics experiments is the repeated rediscovery of the same identifiable peptides by sequence database searching methods, which often are time-consuming and error-prone. A more precise and efficient method, in which previously observed and identified peptide MS/MS spectra are catalogued and condensed into searchable spectral libraries to allow new identifications by spectral matching, is seen as a promising alternative. To that end, an open-source, functionally complete, high-throughput and readily extensible MS/MS spectral searching tool, SpectraST, was developed. A high-quality spectral library was constructed by combining the high-confidence identifications of millions of spectra taken from various data repositories and searched using four sequence search engines. The resulting library consists of over 30,000 spectra for Saccharomyces cerevisiae. Using this library, SpectraST vastly outperforms the sequence search engine SEQUEST in terms of speed and the ability to discriminate good and bad hits. A unique advantage of SpectraST is its full integration into the popular Trans Proteomic Pipeline suite of software, which facilitates user adoption and provides important functionalities such as peptide and protein probability assignment, quantification, and data visualization. This method of spectral library searching is especially suited for targeted proteomics applications, offering superior performance to traditional sequence searching.  相似文献   

3.
Pitzer E  Masselot A  Colinge J 《Proteomics》2007,7(17):3051-3054
De novo peptide sequencing algorithms are often tested on relatively small data sets made of excellent spectra. Since there are always more and more tandem mass spectra available, we have assembled six large, reliable, and diverse (three mass spectrometer types) data sets intended for such tests and we make them accessible via a web server. To exemplify their use we investigate the performance of Lutefisk, PepNovo, and PepNovoTag, three well-established peptide de novo sequencing programs.  相似文献   

4.
One of the major bottlenecks in the proteomics field today resides in the computational interpretation of the massive data generated by the latest generation of high‐throughput MS instruments. MS/MS datasets are constantly increasing in size and complexity and it becomes challenging to comprehensively process such huge datasets and afterwards deduce most relevant biological information. The Mass Spectrometry Data Analysis (MSDA, https://msda.unistra.fr ) online software suite provides a series of modules for in‐depth MS/MS data analysis. It includes a custom databases generation toolbox, modules for filtering and extracting high‐quality spectra, for running high‐performance database and de novo searches, and for extracting modified peptides spectra and functional annotations. Additionally, MSDA enables running the most computationally intensive steps, namely database and de novo searches, on a computer grid thus providing a net time gain of up to 99% for data processing.  相似文献   

5.
Advanced proteomic research efforts involving areas such as systems biology or biomarker discovery are enabled by the use of high level informatics tools that allow the effective analysis of large quantities of differing types of data originating from various studies. Performing such analyses on a large scale is not feasible without a computational platform that performs data processing and management tasks. Such a platform must be able to provide high-throughput operation while having sufficient flexibility to accommodate evolving data analysis tools and methodologies. The Proteomics Research Information Storage and Management system (PRISM) provides a platform that serves the needs of the accurate mass and time tag approach developed at Pacific Northwest National Laboratory. PRISM incorporates a diverse set of analysis tools and allows a wide range of operations to be incorporated by using a state machine that is accessible to independent, distributed computational nodes. The system has scaled well as data volume has increased over several years, while allowing adaptability for incorporating new and improved data analysis tools for more effective proteomics research.  相似文献   

6.
High quality, ultra-fast bioanalytical LC/MS/MS methods were developed using short columns packed with fused-core particles and high (1.0–3.0 mL/min) flow rates. For more than two years, at flow rates up to 3.0 mL/min, using 0.33 min non-ballistic gradients, these methods were shown to provide comparable or better performance than slower assays for accuracy, precision, sensitivity, specificity, and ruggedness, and met all criteria required by the bioanalytical regulatory guidance.  相似文献   

7.
Several academic software are available to help the validation and reporting of proteomics data generated by MS analyses. However, to our knowledge, none of them have been conceived to meet the particular needs generated by the study of organisms whose genomes are not sequenced. In that context, we have developed OVNIp, an open‐source application which facilitates the whole process of proteomics results interpretation. One of its unique attributes is its capacity to compile multiple results (from several search engines and/or several databank searches) with a resolution of conflicting interpretations. Moreover, OVNIp enables automated exploitation of de novo sequences generated from unassigned MS/MS spectra leading to higher sequence coverage and enhancing confidence in the identified proteins. The exploitation of these additional spectra might also identify novel proteins through a MS‐BLAST search, which can be easily ran from the OVNIp interface. Beyond this primary scope, OVNIp can also benefit to users who look for a simple standalone application to both visualize and confirm MS/MS result interpretations through a simple graphical interface and generate reports according to user‐defined forms which may integrate the prerequisites for publication. Sources, documentation and a stable release for Windows are available at http://wwwappli.nantes.inra.fr:8180/OVNIp .  相似文献   

8.
In order to maximize protein identification by peptide mass fingerprinting noise peaks must be removed from spectra and recalibration is often required. The preprocessing of the spectra before database searching is essential but is time-consuming. Nevertheless, the optimal database search parameters often vary over a batch of samples. For high-throughput protein identification, these factors should be set automatically, with no or little human intervention. In the present work automated batch filtering and recalibration using a statistical filter is described. The filter is combined with multiple data searches that are performed automatically. We show that, using several hundred protein digests, protein identification rates could be more than doubled, compared to standard database searching. Furthermore, automated large-scale in-gel digestion of proteins with endoproteinase LysC, and matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) analysis, followed by subsequent trypsin digestion and MALDI-TOF analysis were performed. Several proteins could be identified only after digestion with one of the enzymes, and some less significant protein identifications were confirmed after digestion with the other enzyme. The results indicate that identification of especially small and low-abundance proteins could be significantly improved after sequential digestions with two enzymes.  相似文献   

9.
An improved method for peptide sequencing based on acetylation/deuteroacetylation in conjunction with ESI MS is introduced. Derivatization with a 1:1 mixture of acetic anhydride and deuterated acetic anhydride incorporates a stable isotope label into the analyzed molecule. This approach has been initially applied to FAB. Using MS/MS, the technique provides a fast, highly sensitive and reliable determination of the primary structure of unknown peptides. This procedure labels N-terminal fragments formed during MS/MS analysis, resulting in a simplification and faster interpretation of the spectra. The performance of the method has been tested with several synthetic peptides and applied to an efficient sequencing of the peptide map, using a nano-scale LC coupled on-line to a tandem mass spectrometer.  相似文献   

10.
LC-MS/MS analysis on a linear ion trap LTQ mass spectrometer, combined with data processing, stringent, and sequence-similarity database searching tools, was employed in a layered manner to identify proteins in organisms with unsequenced genomes. Highly specific stringent searches (MASCOT) were applied as a first layer screen to identify either known (i.e. present in a database) proteins, or unknown proteins sharing identical peptides with related database sequences. Once the confidently matched spectra were removed, the remainder was filtered against a nonannotated library of background spectra that cleaned up the dataset from spectra of common protein and chemical contaminants. The rectified spectral dataset was further subjected to rapid batch de novo interpretation by PepNovo software, followed by the MS BLAST sequence-similarity search that used multiple redundant and partially accurate candidate peptide sequences. Importantly, a single dataset was acquired at the uncompromised sensitivity with no need of manual selection of MS/MS spectra for subsequent de novo interpretation. This approach enabled a completely automated identification of novel proteins that were, otherwise, missed by conventional database searches.  相似文献   

11.
Xia D  Ghali F  Gaskell SJ  O'Cualain R  Sims PF  Jones AR 《Proteomics》2012,12(12):1912-1916
The development of ion mobility (IM) MS instruments has the capability to provide an added dimension to peptide analysis pipelines in proteomics, but, as yet, there are few software tools available for analysing such data. IM can be used to provide additional separation of parent ions or product ions following fragmentation. In this work, we have created a set of software tools that are capable of converting three dimensional IM data generated from analysis of fragment ions into a variety of formats used in proteomics. We demonstrate that IM can be used to calculate the charge state of a fragment ion, demonstrating the potential to improve peptide identification by excluding non-informative ions from a database search. We also provide preliminary evidence of structural differences between b and y ions for certain peptide sequences but not others. All software tools and data sets are made available in the public domain at http://code.google.com/p/ion-mobility-ms-tools/.  相似文献   

12.
Organisms without a sequenced genome and lacking a complete protein database encounter an added level of complexity to protein identification and quantitation. De novo sequencing, new bioinformatics tools, and mass spectrometry (MS) techniques allow for advances in this area. Here, the proteomic characterization of an unsequenced psychrophilic bacterium, Pedobacter cryoconitis, is presented employing a novel workflow based on (15) N metabolic labelling, 2DE, MS/MS, and bioinformatics tools. Two bioinformatics pipelines, based on nitrogen constraint (N-constraint), ortholog searching, and de novo peptide sequencing with N-constraint similarity database search, are compared based on proteome coverage and throughput. Results demonstrate the effect of different growth temperatures (1°C, 20°C) and different carbon sources (glucose, maltose) on the proteome. Seventy-six and 69 proteins were identified and validated from the glucose- and maltose-grown bacterium, respectively, from which 21 and 22 were differentially expressed at different growth temperatures. Differentially expressed proteins are involved in stress response and carbohydrate metabolism, with higher expression at 20°C than at 1°C, while antioxidants were upregulated at 1°C. This study provides an alternative workflow to identify, validate, and quantify proteins from unsequenced organisms distantly related to other species in the protein database. Furthermore, it provides further understanding on bacterial adaptation mechanisms to cold environments, and a comparative proteomic analyses with other psychrophilic microorganisms.  相似文献   

13.
In proteomic studies, assigning protein identity from organisms whose genomes are yet to be completely sequenced remains a challenging task. For these organisms, protein identification is typically based on cross species matching of amino acid sequence obtained from collision induced dissociation (CID) of peptides using mass spectrometry. The most direct approach of de novo sequencing is slow and often difficult, due to the complexity of the resultant CID spectra. For MALDI-MS, this problem has been addressed by using chemical derivatisation to direct peptide fragmentation, thereby simplifying CID spectra and facilitating de novo interpretation. In this study, milk whey proteins from the tammar wallaby (Macropus eugenii) were used to evaluate three chemical derivatisation methods compatible with MALDI MS/MS. These methods included (i) guanidination and sulfonation using chemically-assisted fragmentation (CAF), (ii) guanidination and sulfonation using 4-sulfophenyl isothiocyanate (SPITC) and (iii) derivatising the epsilon-amino group of lysine residues with Lys Tag 4H. Derivatisation with CAF and SPITC resulted in more protein identification than Lys Tag 4H. Sulfonation using SPITC was the preferred method due to the low cost per experiment, the reactivity with both lysine and arginine terminated peptides and the resultant simplified MS/MS spectra.*Australian Peptide Conference Issue.**This project was funded by an ARC Linkage grant to Deane supported by TGR Biosciences and facilitated by access to the Australian Proteome Analysis Facility established under the Australian Government’s Major National Research Facilities program.  相似文献   

14.
MS‐based proteomics characterizes protein contents of biological samples. The most common approach is to first match observed MS/MS peptide spectra against theoretical spectra from a protein sequence database and then to score these matches. The false discovery rate (FDR) can be estimated as a function of the score by searching together the protein sequence database and its randomized version and comparing the score distributions of the randomized versus nonrandomized matches. This work introduces a straightforward isotonic regression‐based method to estimate the cumulative FDRs and local FDRs (LFDRs) of peptide identification. Our isotonic method not only performed as well as other methods used for comparison, but also has the advantages of being: (i) monotonic in the score, (ii) computationally simple, and (iii) not dependent on assumptions about score distributions. We demonstrate the flexibility of our approach by using it to estimate FDRs and LFDRs for protein identification using summaries of the peptide spectra scores. We reconfirmed that several of these methods were superior to a two‐peptide rule. Finally, by estimating both the FDRs and LFDRs, we showed for both peptide and protein identification, moderate FDR values (5%) corresponded to large LFDR values (53 and 60%).  相似文献   

15.
Intragenomic variation is the molecular variation within the genome among repetitive DNA. As a multigene family, nuclear ribosomal DNA (rDNA) has been widely used in fungal taxonomy for their ease in amplification and suitable variability to attain various levels of taxonomic resolution. At the intraspecific level, rDNA is believed to be under concerted evolution and the internal transcribed spacers (ITS) region is actually accepted as a universal barcoding marker for fungi. However, documentation of intragenomic variation of rDNA indicated that it can be problematic in species delimitation and identification. Fungal taxonomic studies have not generally taken into account the intragenomic variation of rDNA in a systematic manner. In this review, our objective is to address the definition, the origin and the mechanisms for maintenance of intragenomic variation, as well as its implication in the domain of fungal molecular taxonomy, particularly for species delimitation, identification and DNA barcoding. With advanced sequencing technologies (second and third generations), we also addressed how these technologies can be used to study the intragenomic variation of rDNA and also how the intragenomic variation will impact on DNA barcoding via high-throughput sequencing.  相似文献   

16.
In tandem mass spectrometry (MS/MS), there are several different fragmentation techniques possible, including, collision‐induced dissociation (CID) higher energy collisional dissociation (HCD), electron‐capture dissociation (ECD), and electron transfer dissociation (ETD). When using pairs of spectra for de novo peptide sequencing, the most popular methods are designed for CID (or HCD) and ECD (or ETD) spectra because of the complementarity between them. Less attention has been paid to the use of CID and HCD spectra pairs. In this study, a new de novo peptide sequencing method is proposed for these spectra pairs. This method includes a CID and HCD spectra merging criterion and a parent mass correction step, along with improvements to our previously proposed algorithm for sequencing merged spectra. Three pairs of spectral datasets were used to investigate and compare the performance of the proposed method with other existing methods designed for single spectrum (HCD or CID) sequencing. Experimental results showed that full‐length peptide sequencing accuracy was increased significantly by using spectra pairs in the proposed method, with the highest accuracy reaching 81.31%.  相似文献   

17.
For bottom‐up proteomics, there are wide variety of database‐searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid‐search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection‐–referred to as STEPS‐–utilizes user‐defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal “parameter set” for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true‐positive identifications are demonstrated using datasets derived from immunoaffinity‐depleted blood serum and a bacterial cell lysate, two common proteomics sample types.  相似文献   

18.
19.
Alternative splicing is generally accepted as a mechanism that explains the discrepancy between the number of genes and proteins. We used peptide mass fingerprinting with a theoretical database and scoring method to discover and identify alternative splicing isoforms. Our theoretical database was built using published alternative splicing databases such as ECgene, H-DBAS, and TISA. According to our theoretical database of 190,529 isoforms, 37% of human genes have multiple isoforms. The isoforms produced from a gene partially share common peptide fragments because they have common exons, making it difficult to distinguish isoforms. Therefore, we developed a new method that effectively distinguishes a true isoform among multiple isoforms in a gene. In order to evaluate our algorithm, we made test sets for 4226 protein isoforms extracted from our theoretical database randomly. Consequently, 94% of true isoforms were identified by our scoring algorithm.  相似文献   

20.
当前,基于生物质谱进行蛋白质鉴定的技术已经成为蛋白质组学研究的支撑技术之一.产生的数据主要使用数据库搜索的方法进行处理,这种方法的一大缺陷是不能鉴定数据库中未包含的蛋白质,因此如何充分利用质谱数据对蛋白质组研究的意义很大,而新蛋白质鉴定更是其中一个重要的内容.新蛋白质鉴定是蛋白质鉴定的一个方面,新蛋白质的定义按照序列和功能的已知程度分为3个层次;以蛋白质鉴定的方法为基础,目前新蛋白质鉴定的方法可分为denovo测序和相似序列搜索结合的方法以及搜索EST、基因组等核酸数据库的方法2大类;两者各有利弊.存在各自的问题和相应处理的策略.不同的研究者可以根据具体目的应用和发展不同的鉴定方法,同时新蛋白质的鉴定也将随着蛋白质组学研究的发展而更加完善.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号