首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 750 毫秒
1.
De novo peptide sequencing by mass spectrometry (MS) can determine the amino acid sequence of an unknown peptide without reference to a protein database. MS-based de novo sequencing assumes special importance in focused studies of families of biologically active peptides and proteins, such as hormones, toxins, and antibodies, for which amino acid sequences may be difficult to obtain through genomic methods. These protein families often exhibit sequence homology or characteristic amino acid content; yet, current de novo sequencing approaches do not take advantage of this prior knowledge and, hence, search an unnecessarily large space of possible sequences. Here, we describe an algorithm for de novo sequencing that incorporates sequence constraints into the core graph algorithm and thereby reduces the search space by many orders of magnitude. We demonstrate our algorithm in a study of cysteine-rich toxins from two cone snail species (Conus textile and Conus stercusmuscarum) and report 13 de novo and about 60 total toxins.  相似文献   

2.
The conventional approach in modern proteomics to identify proteins from limited information provided by molecular and fragment masses of their enzymatic degradation products carries an inherent risk of both false positive and false negative identifications. For reliable identification of even known proteins, complete de novo sequencing of their peptides is desired. The main problems of conventional sequencing based on tandem mass spectrometry are incomplete backbone fragmentation and the frequent overlap of fragment masses. In this work, the first proteomics-grade de novo approach is presented, where the above problems are alleviated by the use of complementary fragmentation techniques CAD and ECD. Implementation of a high-current, large-area dispenser cathode as a source of low-energy electrons provided efficient ECD of doubly charged peptides, the most abundant species (65-80%), in a typical trypsin-based proteomics experiment. A new linear de novo algorithm is developed combining efficiency and speed, processing on a conventional 3 GHz PC, 1000 MS/MS data sets in 60 s. More than 6% of all MS/MS data for doubly charged peptides yielded complete sequences, and another 13% gave nearly complete sequences with a maximum gap of two amino acid residues. These figures are comparable with the typical success rates (5-15%) of database identification. For peptides reliably found in the database (Mowse score > or = 34), the agreement with de novo-derived full sequences was >95%. Full sequences were derived in 67% of the cases when full sequence information was present in MS/MS spectra. Thus the new de novo sequencing approach reached the same level of efficiency and reliability as conventional database-identification strategies.  相似文献   

3.
MOTIVATION: Peptide identification following tandem mass spectrometry (MS/MS) is usually achieved by searching for the best match between the mass spectrum of an unidentified peptide and model spectra generated from peptides in a sequence database. This methodology will be successful only if the peptide under investigation belongs to an available database. Our objective is to develop and test the performance of a heuristic optimization algorithm capable of dealing with some features commonly found in actual MS/MS spectra that tend to stop simpler deterministic solution approaches. RESULTS: We present the implementation of a Genetic Algorithm (GA) in the reconstruction of amino acid sequences using only spectral features, discuss some of the problems associated with this approach and compare its performance to a de novo sequencing method. The GA can potentially overcome some of the most problematic aspects associated with de novo analysis of real MS/MS data such as missing or unclearly defined peaks and may prove to be a valuable tool in the proteomics field. We assess the performance of our algorithm under conditions of perfect spectral information, in situations where key spectral features are missing, and using real MS/MS spectral data.  相似文献   

4.
Kim SI  Kim JY  Kim EA  Kwon KH  Kim KW  Cho K  Lee JH  Nam MH  Yang DC  Yoo JS  Park YM 《Proteomics》2003,3(12):2379-2392
As an initial step to the comprehensive proteomic analysis of Panax ginseng C. A. Meyer, protein mixtures extracted from the cultured hairy root of Panax ginseng were separated by two-dimensional polyacrylamide gel electrophoresis (2-DE). The protein spots were analyzed and identified by peptide finger printing and internal amino acid sequencing by matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS) and electrospray ionization quadrupole-time of flight mass spectrometry (ESI Q-TOF MS), respectively. More than 300 protein spots were detected on silver stained two-dimensional (2-D) gels using pH 3-10, 4-7, and 4.5-5.5 gradients. Major protein spots (159) were analyzed by peptide fingerprinting or de novo sequencing and the functions of 91 of these proteins were identified. Protein identification was achieved using the expressed sequence tag (EST) database from Panax ginseng and the protein database of plants like Arabidopsis thaliana and Oryza sativa. However, peptide mass fingerprinting by MALDI-TOF MS alone was insufficient for protein identification because of the lack of a genome database for Panax ginseng. Only 17 of the 159 protein spots were verified by peptide mass fingerprinting using MALDI-TOF MS whereas 87 out of 102 protein spots, which included 13 of the 17 proteins identified by MALDI-TOF MS, were identified by internal amino acid sequencing using tandem mass spectrometry analysis by ESI Q-TOF MS. When the internal amino acid sequences were used as identification markers, the identification rate exceeded 85.3%, suggesting that a combination of internal sequencing and EST data analysis was an efficient identification method for proteome analysis of plants having incomplete genome data like ginseng. The 2-D patterns of the main root and leaves of Panax ginseng differed from that of the cultured hairy root, suggesting that some proteins are exclusively expressed by different tissues for specific cellular functions. Proteome analysis will undoubtedly be helpful for understanding the physiology of Panax ginseng.  相似文献   

5.
6.
MOTIVATION: Peptide-sequencing methods by mass spectrum use the following two approaches: database searching and de novo sequencing. The database-searching approach is convenient; however, in cases wherein the corresponding sequences are not included in the databases, the exact identification is difficult. On the other hand, in the case of de novo sequencing, no preliminary information is necessary; however, continuous amino acid sequence peaks and the differentiation of these peaks are required. It is, however, very difficult to obtain and differentiate the peaks of all amino acids by using an actual spectrum. We propose a novel de novo sequencing approach using not only mass-to-charge ratio but also ion peak intensity and amino acid cleavage intensity ratio (CIR). RESULTS: Our method compensates for any undetectable amino acid peak intervals by estimating the amino acid set and the probability of peak expression based on amino acid CIR. It provides more accurate identification of sequences than the existing methods, by which it is usually difficult to sequence.  相似文献   

7.
未知基因组及蛋白质序列数据库有限的物种的蛋白质组学分析是当前一些非模式生物物种蛋白质组学研究领域的瓶颈之一.基于同源性搜索的BLAST方法(MS BLAST),是近年新发展起来的一种用于未知基因组的蛋白质鉴定的搜索工具,已成功应用于许多未知基因组物种的蛋白质鉴定.SPITC化学辅助方法是本实验室建立的一种改进的de novo质谱测序方法.采用MS BLAST方法对经Mascot软件数据库搜索未能鉴定到的19个金鱼胚胎蛋白质进行鉴定,其中12个蛋白质是直接测序后进行MS BLAST搜索得到的结果,另外7个蛋白质是联合MS BLAST和SPITC衍生方法得到的鉴定结果.实验结果证明,采用MS BLAST方法进行蛋白质的跨物种鉴定具有可行性和可靠性,给蛋白质的跨物种鉴定提供了一条新的途径.  相似文献   

8.
Protein identification has been greatly facilitated by database searches against protein sequences derived from product ion spectra of peptides. This approach is primarily based on the use of fragment ion mass information contained in a MS/MS spectrum. Unambiguous protein identification from a spectrum with low sequence coverage or poor spectral quality can be a major challenge. We present a two-dimensional (2D) mass spectrometric method in which the numbers of nitrogen atoms in the molecular ion and the fragment ions are used to provide additional discriminating power for much improved protein identification and de novo peptide sequencing. The nitrogen number is determined by analyzing the mass difference of corresponding peak pairs in overlaid spectra of (15)N-labeled and unlabeled peptides. These peptides are produced by enzymatic or chemical cleavage of proteins from cells grown in (15)N-enriched and normal media, respectively. It is demonstrated that, using 2D information, i.e., m/z and its associated nitrogen number, this method can, not only confirm protein identification results generated by MS/MS database searching, but also identify peptides that are not possible to identify by database searching alone. Examples are presented of analyzing Escherichia coli K12 extracts that yielded relatively poor MS/MS spectra, presumably from the digests of low abundance proteins, which can still give positive protein identification using this method. Additionally, this 2D MS method can facilitate spectral interpretation for de novo peptide sequencing and identification of posttranslational or other chemical modifications. We envision that this method should be particularly useful for proteome expression profiling of organelles or cells that can be grown in (15)N-enriched media.  相似文献   

9.
Lack of genomic sequence data and the relatively high cost of tandem mass spectrometry have hampered proteomic investigations into helminths, such as resolving the mechanism underpinning globally reported anthelmintic resistance. Whilst detailed mechanisms of resistance remain unknown for the majority of drug-parasite interactions, gene mutations and changes in gene and protein expression are proposed key aspects of resistance. Comparative proteomic analysis of drug-resistant and -susceptible nematodes may reveal protein profiles reflecting drug-related phenotypes. Using the gastro-intestinal nematode, Haemonchus contortus as case study, we report the application of freely available expressed sequence tag (EST) datasets to support proteomic studies in unsequenced nematodes. EST datasets were translated to theoretical protein sequences to generate a searchable database. In conjunction with matrix-assisted laser desorption ionisation time-of-flight mass spectrometry (MALDI-TOF-MS), Peptide Mass Fingerprint (PMF) searching of databases enabled a cost-effective protein identification strategy. The effectiveness of this approach was verified in comparison with MS/MS de novo sequencing with searching of the same EST protein database and subsequent searches of the NCBInr protein database using the Basic Local Alignment Search Tool (BLAST) to provide protein annotation. Of 100 proteins from 2-DE gel spots, 62 were identified by MALDI-TOF-MS and PMF searching of the EST database. Twenty randomly selected spots were analysed by electrospray MS/MS and MASCOT Ion Searches of the same database. The resulting sequences were subjected to BLAST searches of the NCBI protein database to provide annotation of the proteins and confirm concordance in protein identity from both approaches. Further confirmation of protein identifications from the MS/MS data were obtained by de novo sequencing of peptides, followed by FASTS algorithm searches of the EST putative protein database. This study demonstrates the cost-effective use of available EST databases and inexpensive, accessible MALDI-TOF MS in conjunction with PMF for reliable protein identification in unsequenced organisms.  相似文献   

10.
Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.Database search tools, such as Sequest (3), Mascot (4), and InsPecT (5), are the most frequently used methods for reliable protein identification in tandem mass (MS/MS) spectrometry based proteomics. These operate by separately matching each MS/MS spectrum to peptide sequences from reference protein databases where all proteins of interest are presumably contained. But this assumption often does not hold true as many important proteins, such as monoclonal antibodies, are not contained in any database because mechanisms of antibody variation (including genetic recombination and somatic hyper-mutation (6)) constantly create new proteins with novel unique sequences. These mechanisms of variation are the foundation of adaptive immune systems and have enabled highly successful antibody-based therapeutic strategies (7, 8). Nevertheless, such variation also means that antibody MS/MS spectra are typically impossible to identify via standard database search techniques whenever the corresponding sequences are not known in advance. An inherent drawback of database search strategies is that they are only as good as the database(s) being searched and incomplete databases often result in proteins being misidentified or left unidentified (9).Despite the importance of novel protein identification, few high-throughput methods have been developed for de novo sequencing of unknown proteins. Low-throughput Edman degradation is a well-known de novo sequencing approach that can accurately call amino acid sequences in N/C-terminal regions of unknown proteins but has drawbacks that make it unsuitable for sequencing proteins longer than 50 amino acids or proteins with post-translational modifications (10, 11). Many have recognized the potential of tandem mass spectrometry for protein sequencing. For example, in 1987 Johnson and Biemann (12) manually sequenced a complete protein from rabbit bone marrow. Meanwhile, automated de novo sequencing methods that rely on interpretations of individual MS/MS spectra are limited in that they typically cannot reconstruct long (8+ AA) sequences without mis-predicting 1 in 5 AA on average for low accuracy collision-induced dissociation (CID) spectra (13, 14). Recent advances in de novo peptide sequencing have improved sequencing accuracy to over 95% for high resolution higher energy collisional dissociation (HCD)1 spectra (15), but at limited sequence coverage (Chi H et al. report only 55% sequence coverage of peptides identified by database search). In fact, all current per-spectrum de novo sequencing strategies face a significant tradeoff between sequencing accuracy and coverage as spectra exhibiting complete peptide fragmentation rarely cover entire target proteins, yet are required to accurately reconstruct full-length peptide sequences. An alternative approach to separately sequencing individual spectra is to simultaneously interpret multiple MS/MS spectra from overlapping peptides. This Shotgun Protein Sequencing (SPS) paradigm differs from traditional algorithms by deriving consensus sequences from contigs - sets of multiple MS/MS spectra from distinct peptides with overlapping sequences (1, 16). Because SPS aggregates multiple spectra from overlapping peptides, protein sequences extending beyond the length of enzymatically digested peptides can be extracted from spectra with incomplete peptide fragmentation. Furthermore, SPS has been found to generate sequences that frequently cover 90–95+% of the target protein sequence(s) whereas mis-predicting only 1 out of every 20 amino acids on high resolution MS/MS spectra (2). But a remaining limitation of SPS is that it still generates fragmented sequences that do not singularly cover large regions of the target protein sequences, much less complete proteins: SPS sequences have an average length of 10–15 amino acids (depending on input data) and the longest recovered SPS de novo sequence is less than 45 amino acids long (1).The considerable limitations of de novo sequencing strategies have typically been addressed by attempting to circumvent them using error-tolerant matching to known protein sequences. One such strategy (17) is to generate short de novo sequence tags and then match them exactly to protein databases without requiring matching the N/C-term flanking masses (to allow for unexpected polymorphisms or post-translational modifications). Short sequence tags are usually derived from parts of the spectrum with high signal-to-noise ratios and typically have higher sequencing accuracy than full-length de novo sequences (18). This approach was later extended in MS-Shotgun (19) and continues to be a popular technique for speeding up database search tools (5, 2022). Homology matching of full length de novo sequences was first explored in CIDentify (23) and later in MS-BLAST (24) by searching de novo sequences using FASTA and WU-BLAST2 (respectively) to find homologous matches to sequences of related proteins; FASTS (25) also approached the problem using a modified version of FASTA. However, common de novo sequencing errors tend to produce sequences that are heavily penalized in pure sequence homology searches. For example, missing peaks in MS/MS spectra may easily cause GA subsequences to be reconstructed as Q or AG (same-mass sequences), thus making subsequent BLAST searches unlikely to succeed. This issue was partially considered in CIDentify and more thoroughly addressed in SPIDER (26) by explicitly modeling de novo sequencing errors together with BLOSUM scores in MS/MS-based sequence homology searches. In addition, OpenSea (27) further explored database matching of de novo sequences for analysis of unexpected post-translational modifications (PTMs). Finally, Shen et al. (28) used short unique de novo sequence tags, called UStags, to discover protein-localized PTMs.Recent approaches to homology matching of de novo sequences have built on genome assembly and sequencing techniques to achieve database-assisted full-length sequencing of unknown proteins. Comparative Shotgun Protein Sequencing (cSPS) complemented SPS assembly techniques with usage of error tolerant matching of de novo sequences to find overlapping SPS de novo sequences that are then further assembled into full-length protein sequences (2). cSPS was designed to support the sequencing of highly divergent proteins that have regions close enough in homology to transfer matches from a reference. cSPS was shown to enable de novo sequencing of monoclonal antibodies at 95+% sequencing accuracy, while simultaneously tolerating and identifying unexpected PTMs (29). In difference from cSPS, Champs (30) de novo sequences individual spectra to obtain putative peptide sequences, which are then mapped to homologous proteins to correct sequencing errors and reconstruct protein sequences with 100% accuracy and 99% coverage. However, Champs is designed to only map peptides that differ from the reference sequence by one or two amino acids and does not handle PTMs. As such, its sequencing accuracy is not directly comparable to that of cSPS as Champs was not designed to sequence highly divergent proteins (such as monoclonal antibodies) with multiple PTMs, insertions, deletions, and/or recombinations. GenoMS (31) extended the approaches in cSPS/Champs by explicitly modeling protein splice variants as paths in splice graphs where nodes represent translated exon regions (32). MS/MS spectra are first searched for exact sequence matches against all possible protein isoforms. The remaining unidentified MS/MS spectra are then aligned to the matched peptides and de novo sequenced to extend the matched sequences into novel regions. Reported sequences are 97–99% accurate and cover 96–99% of target proteins depending on sequence similarity between the novel and reference sequences (31). However, GenoMS de novo sequences are usually extended less than 3 amino acids beyond matched peptides because sequencing accuracy degrades as sequences are extended, thus preventing the consistent extension of long (10+ AA) sequences. Altogether, the use of homology matching approaches for full-length de novo protein sequencing continues to be limited by 1) requiring the previous knowledge of closely related protein sequences and 2) the inherent difficulties in statistically significant homology-tolerant matching of error-prone short de novo sequences.The Meta-SPS approach proposed here seeks to de novo sequence complete proteins, or long protein regions, without any use of a database. Meta-SPS builds upon SPS by treating SPS de novo sequences (contig sequences) as input spectra and further assembling them into longer de novo sequences (meta-contig sequences). We show that Meta-SPS extends de novo sequences to lengths over 100 AA while boosting sequencing accuracy to only 1 mistake per 40 amino acid predictions, thus enabling database-free de novo sequencing of completely novel proteins while also allowing error-tolerant matching approaches to support higher-divergence homologies (by searching longer, more accurate de novo sequences). Meta-SPS algorithms are demonstrated on CID and HCD MS/MS spectra and its limitations are discussed in relation to the underlying limitations of bottom-up tandem mass spectrometry.  相似文献   

11.
The MS/MS analysis by Electrospray ionization quadrupole-time of flight mass spectrometry (ESI-Q-TOF MS) was applied to identify proteins in proteome analysis of bacteria whose genomes are not known. The protein identification by ESI-Q-TOF MS was performed sequentially by database search and then de novo sequencing using MS/MS spectra. Soil bacteria having unanalyzed genome, Acinetobacter lwoffii K24 is an aniline degrading bacterium. In this report, we present the results of a comparison between the proteome profile of A. lwoffii K24 cultured in aniline- or succinate-containing media. Protein analysis was performed using two-dimensional gel electrophoresis (2-DE) with pH 3-10 immobilized pH gradient (IPG) strips followed by ESI-Q-TOF MS. More than 780 protein spots were detected by 2-DE from the soluble proteome. Forty-eight of these proteins were expressed exclusively in aniline cultured bacteria, and 81 proteins increased and 162 proteins decreased in aniline-cultured versus succinate cultured A. lwoffii K24. Internal amino acid sequences of 43 major protein spots were successfully determined by ESI-Q-TOF MS to try to identify the bacterial proteins responding to aniline culture condition. Since the A. lwoffii K24 genome is not yet sequenced, many proteins were found to be hypothetical. Comparative proteome analysis of the insoluble protein fractions showed that one novel protein that was strongly induced by succinate-cultured A. lwoffii K24 was repressed under aniline culture conditions. These results suggest that comprehensive analysis of bacterial proteomes by 2-DE and amino acid sequence analysis by ESI-Q-TOF MS is useful for understanding induced novel proteins of biodegrading bacteria.  相似文献   

12.
当前,基于生物质谱进行蛋白质鉴定的技术已经成为蛋白质组学研究的支撑技术之一.产生的数据主要使用数据库搜索的方法进行处理,这种方法的一大缺陷是不能鉴定数据库中未包含的蛋白质,因此如何充分利用质谱数据对蛋白质组研究的意义很大,而新蛋白质鉴定更是其中一个重要的内容.新蛋白质鉴定是蛋白质鉴定的一个方面,新蛋白质的定义按照序列和功能的已知程度分为3个层次;以蛋白质鉴定的方法为基础,目前新蛋白质鉴定的方法可分为denovo测序和相似序列搜索结合的方法以及搜索EST、基因组等核酸数据库的方法2大类;两者各有利弊.存在各自的问题和相应处理的策略.不同的研究者可以根据具体目的应用和发展不同的鉴定方法,同时新蛋白质的鉴定也将随着蛋白质组学研究的发展而更加完善.  相似文献   

13.
Mass spectrometry-driven BLAST (MS BLAST) is a database search protocol for identifying unknown proteins by sequence similarity to homologous proteins available in a database. MS BLAST utilizes redundant, degenerate, and partially inaccurate peptide sequence data obtained by de novo interpretation of tandem mass spectra and has become a powerful tool in functional proteomic research. Using computational modeling, we evaluated the potential of MS BLAST for proteome-wide identification of unknown proteins. We determined how the success rate of protein identification depends on the full-length sequence identity between the queried protein and its closest homologue in a database. We also estimated phylogenetic distances between organisms under study and related reference organisms with completely sequenced genomes that allow substantial coverage of unknown proteomes.  相似文献   

14.
The recent proliferation of novel mass spectrometers such as Fourier transform, QTOF, and OrbiTrap marks a transition into the era of precision mass spectrometry, providing a 2 orders of magnitude boost to the mass resolution, as compared to low-precision ion-trap detectors. We investigate peptide de novo sequencing by precision mass spectrometry and explore some of the differences when compared to analysis of low-precision data. We demonstrate how the dramatically improved performance of de novo sequencing with precision mass spectrometry paves the way for novel approaches to peptide identification that are based on direct sequence lookups, rather than comparisons of spectra to a database. With the direct sequence lookup, it is not only possible to search a database very efficiently, but also to use the database in novel ways, such as searching for products of alternative splicing or products of fusion proteins in cancer. Our de novo sequencing software is available for download at http://peptide.ucsd.edu/.  相似文献   

15.
De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database. Our objective was to define a strategy for this type of homology-tolerant database search. Homology searches, using MS-Homology software, were conducted with 20, 10, or 5 of the most abundant peptides from 9 proteins, based either on precursor trigger intensity or on total ion current, and allowing for 50%, 30%, or 10% mismatch in the search. Protein scores were corrected by subtracting a threshold score that was calculated from random peptides. The highest (p < .01) corrected protein scores (i.e., above the threshold) were obtained by submitting 20 peptides and allowing 30% mismatch. Using these criteria, protein identification based on ion mass searching using MS/MS data (i.e., Mascot) was compared with that obtained using homology search. The highest-ranking protein was the same using Mascot, homology search using the 20 most intense peptides, or homology search using all peptides, for 63.4% of 112 spots from two-dimensional polyacrylamide gel electrophoresis gels. For these proteins, the percent coverage was greatest using Mascot compared with the use of all or just the 20 most intense peptides in a homology search (25.1%, 18.3%, and 10.6%, respectively). Finally, 35% of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide. This percentage increased when the search was limited to the 20 most intense peptides (44.0%). After identifying the protein using MS-Homology, a peptide mass search may increase the percent coverage of the protein identified.  相似文献   

16.
Strategic proteome analysis of Candida magnoliae with an unsequenced genome   总被引:2,自引:0,他引:2  
Kim HJ  Lee DY  Lee DH  Park YC  Kweon DH  Ryu YW  Seo JH 《Proteomics》2004,4(11):3588-3599
Erythritol is a noncariogenic, low calorie sweetener. It is safe for people with diabetes and obese people. Candida magnoliae is an industrially important organism because of its ability to produce erythritol as a major product. The genome of C. magnoliae has not been sequenced yet, limiting the available proteome database. Therefore, systematic approaches were employed to construct the proteome map of C. magnoliae. Proteomic analysis with systematic approaches is based on two-dimensional electrophoresis, matrix-assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF MS), tandem mass spectrometry (MS/MS) and database interrogation. First, 24 spots were analyzed using peptide mass fingerprinting along with MALDI-TOF MS with high mass accuracy. Only four spots were reliably identified as carbonyl reductase and its isoforms. The reason for low sequence coverage seemed to be that these identification strategies were based on the presence of the protein database obtained from the publicly accessible genome database and the availability of cross-species protein identification. MS/MS (MS/MS ion search and de novo sequencing) in combination with similarity searches allowed successful identification of 39 spots. Several proteins including transaldolase identified by MS/MS ion searches were further confirmed by partial sequences from the expressed sequence tag database. In this study, 51 protein spots were analyzed and then potentially identified. The identified proteins were involved in glycolysis, stress response, other essential metabolisms and cell structures.  相似文献   

17.
We present AUDENS, a new platform-independent open source tool for automated de novo sequencing of peptides from MS/MS data. We implemented a dynamic programming algorithm and combined it with a flexible preprocessing module which is designed to distinguish between signal and other peaks. By applying a user-defined set of heuristics, AUDENS screens through the spectrum and assigns high relevance values to putative signal peaks. The algorithm constructs a sequence path through the MS/MS spectrum using the peak relevances to score each suggested sequence path, i.e., the corresponding amino acid sequence. At present, we consider AUDENS a prototype that unfolds its biggest potential if used in parallel with other de novo sequencing tools. AUDENS is available open source and can be downloaded with further documentation at http://www.ti.inf.ethz.ch/pw/software/audens/ .  相似文献   

18.
Bioinformatics tools for proteomics, also called proteome informatics tools, span today a large panel of very diverse applications ranging from simple tools to compare protein amino acid compositions to sophisticated software for large-scale protein structure determination. This review considers the available and ready to use tools that can help end-users to interpret, validate and generate biological information from their experimental data. It concentrates on bioinformatics tools for 2-DE analysis, for LC followed by MS analysis, for protein identification by PMF, by peptide fragment fingerprinting and by de novo sequencing and for data quantitation with MS data. It also discloses initiatives that propose to automate the processes of MS analysis and enhance the quality of the obtained results.  相似文献   

19.
LC-MS/MS analysis on a linear ion trap LTQ mass spectrometer, combined with data processing, stringent, and sequence-similarity database searching tools, was employed in a layered manner to identify proteins in organisms with unsequenced genomes. Highly specific stringent searches (MASCOT) were applied as a first layer screen to identify either known (i.e. present in a database) proteins, or unknown proteins sharing identical peptides with related database sequences. Once the confidently matched spectra were removed, the remainder was filtered against a nonannotated library of background spectra that cleaned up the dataset from spectra of common protein and chemical contaminants. The rectified spectral dataset was further subjected to rapid batch de novo interpretation by PepNovo software, followed by the MS BLAST sequence-similarity search that used multiple redundant and partially accurate candidate peptide sequences. Importantly, a single dataset was acquired at the uncompromised sensitivity with no need of manual selection of MS/MS spectra for subsequent de novo interpretation. This approach enabled a completely automated identification of novel proteins that were, otherwise, missed by conventional database searches.  相似文献   

20.
We present and evaluate a strategy for the mass spectrometric identification of proteins from organisms for which no genome sequence information is available that incorporates cross-species information from sequenced organisms. The presented method combines spectrum quality scoring, de novo sequencing and error tolerant BLAST searches and is designed to decrease input data complexity. Spectral quality scoring reduces the number of investigated mass spectra without a loss of information. Stringent quality-based selection and the combination of different de novo sequencing methods substantially increase the catalog of significant peptide alignments. The de novo sequences passing a reliability filter are subsequently submitted to error tolerant BLAST searches and MS-BLAST hits are validated by a sampling technique. With the described workflow, we identified up to 20% more groups of homologous proteins in proteome analyses with organisms whose genome is not sequenced than by state-of-the-art database searches in an Arabidopsis thaliana database. We consider the novel data analysis workflow an excellent screening method to identify those proteins that evade detection in proteomics experiments as a result of database constraints.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号