首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A derivatization reaction, guanidination, was recently reported that increases MALDI-TOF MS sensitivity toward lysine-terminated peptides. Its application conveys sequence information that can be used as a parameter in peptide mass mapping database searches. This paper presents a systematic study of the impact of guanidination on proteomic analysis of an entire bacterial organelle. Sixty-two 2-D gel isolated proteins from Caulobacter crescentus stalks were studied. A novel computer algorithm, Prodigies, was developed to analyze the data. Absolute confidence limits associated with protein assignments were established using Monte Carlo simulations of database searches. The advantages of guanidination are illustrated using both experimental and theoretical data.  相似文献   

2.
A new method for enhancing peptide ion identification in proteomics analyses using ion mobility data is presented. Ideally, direct comparisons of experimental drift times (t(D)) with a standard mobility database could be used to rank candidate peptide sequence assignments. Such a database would represent only a fraction of sequences in protein databases and significant difficulties associated with the verification of data for constituent peptide ions would exist. A method that employs intrinsic amino acid size parameters to obtain ion mobility predictions that can be used to rank candidate peptide ion assignments is proposed. Intrinsic amino acid size parameters have been determined for doubly charged peptide ions from an annotated yeast proteome. Predictions of ion mobilities using the intrinsic size parameters are more accurate than those obtained from a polynomial fit to t(D) versus molecular weight data. More than a 2-fold improvement in prediction accuracy has been observed for a group of arginine-terminated peptide ions 12 residues in length. The use of this predictive enhancement as a means to aid peptide ion identification is discussed, and a simple peptide ion scoring scheme is presented.  相似文献   

3.
Fast atom bombardment mass spectrometry is used for the analysis of the series of molecular products formed by the cleavage of polypeptide substrates with the exopeptidases carboxypeptidase Y and leucine aminopeptidase. By following the polypeptide molecular species rather than the released residues, sequence information is obtained regardless of the relative rates of cleavage of peptide bonds. In addition, unambiguous assignments of sequence can be made in the presence of multiple identical residues. The lower level of sensitivity for the analysis is in the picomole range. When carboxypeptidase Y is used, the method provides a specific and sensitive method for the sequencing of polypeptides from the C-terminus.  相似文献   

4.
MS/MS is a widely used method for proteome‐wide analysis of protein expression and PTMs. The thousands of MS/MS spectra produced from a single experiment pose a major challenge for downstream analysis. Standard programs, such as MASCOT, provide peptide assignments for many of the spectra, including identification of PTM sites, but these results are plagued by false‐positive identifications. In phosphoproteomic experiments, only a single peptide assignment is typically available to support identification of each phosphorylation site, and hence minimizing false positives is critical. Thus, tedious manual validation is often required to increase confidence in the spectral assignments. We have developed phoMSVal, an open‐source platform for managing MS/MS data and automatically validating identified phosphopeptides. We tested five classification algorithms with 17 extracted features to separate correct peptide assignments from incorrect ones using over 2600 manually curated spectra. The naïve Bayes algorithm was among the best classifiers with an AUC value of 97% and PPV of 97% for phosphotyrosine data. This classifier required only three features to achieve a 76% decrease in false positives as compared with MASCOT while retaining 97% of true positives. This algorithm was able to classify an independent phosphoserine/threonine data set with AUC value of 93% and PPV of 91%, demonstrating the applicability of this method for all types of phospho‐MS/MS data. PhoMSVal is available at http://csbi.ltdk.helsinki.fi/phomsval .  相似文献   

5.
The specific assignment of resonances in the 300-MHz 1H nuclear magnetic resonance (NMR) spectrum of anthopleurin-A, a polypeptide cardiac stimulant from the sea anemone Anthopleura xanthogrammica, is described. Assignments have been made using two-dimensional NMR techniques, in particular the method of sequential assignments, where through-bond and through-space connectivities to the peptide backbone NH resonances are used to identify the spin systems of residues adjacent in the amino acid sequence. Complete assignments have been made of the resonances from 33 residues out of a total of 49, and partial assignments of a further 3. The resonances from several of the remaining residues have been identified but not yet specifically assigned. A complicating factor in making these assignments is the conformational heterogeneity exhibited by anthopleurin-A in solution. The resonances from a number of amino acid residues in the minor conformer have also been assigned. These assignments contribute towards identification of the origin of this heterogeneity, and permit some preliminary conclusions to be drawn regarding the secondary structure of the polypeptide.  相似文献   

6.
Abstract A probability-based quantification framework is presented for the calculation of relative peptide and protein abundance in label-free and label-dependent LC-MS proteomics data. The results are accompanied by credible intervals and regulation probabilities. The algorithm takes into account data uncertainties via Poisson statistics modified by a noise contribution that is determined automatically during an initial normalization stage. Protein quantification relies on assignments of component peptides to the acquired data. These assignments are generally of variable reliability and may not be present across all of the experiments comprising an analysis. It is also possible for a peptide to be identified to more than one protein in a given mixture. For these reasons the algorithm accepts a prior probability of peptide assignment for each intensity measurement. The model is constructed in such a way that outliers of any type can be automatically reweighted. Two discrete normalization methods can be employed. The first method is based on a user-defined subset of peptides, while the second method relies on the presence of a dominant background of endogenous peptides for which the concentration is assumed to be unaffected. Normalization is performed using the same computational and statistical procedures employed by the main quantification algorithm. The performance of the algorithm will be illustrated on example data sets, and its utility demonstrated for typical proteomics applications. The quantification algorithm supports relative protein quantification based on precursor and product ion intensities acquired by means of data-dependent methods, originating from all common isotopically-labeled approaches, as well as label-free ion intensity-based data-independent methods.  相似文献   

7.
Development of robust statistical methods for validation of peptide assignments to tandem mass (MS/MS) spectra obtained using database searching remains an important problem. PeptideProphet is one of the commonly used computational tools available for that purpose. An alternative simple approach for validation of peptide assignments is based on addition of decoy (reversed, randomized, or shuffled) sequences to the searched protein sequence database. The probabilistic modeling approach of PeptideProphet and the decoy strategy can be combined within a single semisupervised framework, leading to improved robustness and higher accuracy of computed probabilities even in the case of most challenging data sets. We present a semisupervised expectation-maximization (EM) algorithm for constructing a Bayes classifier for peptide identification using the probability mixture model, extending PeptideProphet to incorporate decoy peptide matches. Using several data sets of varying complexity, from control protein mixtures to a human plasma sample, and using three commonly used database search programs, SEQUEST, MASCOT, and TANDEM/k-score, we illustrate that more accurate mixture estimation leads to an improved control of the false discovery rate in the classification of peptide assignments.  相似文献   

8.
Large proteomic data sets identifying hundreds or thousands of modified peptides are becoming increasingly common in the literature. Several methods for assessing the reliability of peptide identifications both at the individual peptide or data set level have become established. However, tools for measuring the confidence of modification site assignments are sparse and are not often employed. A few tools for estimating phosphorylation site assignment reliabilities have been developed, but these are not integral to a search engine, so require a particular search engine output for a second step of processing. They may also require use of a particular fragmentation method and are mostly only applicable for phosphorylation analysis, rather than post-translational modifications analysis in general. In this study, we present the performance of site assignment scoring that is directly integrated into the search engine Protein Prospector, which allows site assignment reliability to be automatically reported for all modifications present in an identified peptide. It clearly indicates when a site assignment is ambiguous (and if so, between which residues), and reports an assignment score that can be translated into a reliability measure for individual site assignments.  相似文献   

9.
Mass spectrometry has made rapid advances in the recent past and has become the preferred method for proteomics. Although many open source algorithms for peptide identification exist, such as X!Tandem and OMSSA, it has majorly been a domain of proprietary software. There is a need for better, freely available, and configurable algorithms that can help in identifying the correct peptides while keeping the false positives to a minimum. We have developed MassWiz, a novel empirical scoring function that gives appropriate weights to major ions, continuity of b-y ions, intensities, and the supporting neutral losses based on the instrument type. We tested MassWiz accuracy on 486,882 spectra from a standard mixture of 18 proteins generated on 6 different instruments downloaded from the Seattle Proteome Center public repository. We compared the MassWiz algorithm with Mascot, Sequest, OMSSA, and X!Tandem at 1% FDR. MassWiz outperformed all in the largest data set (AGILENT XCT) and was second only to Mascot in the other data sets. MassWiz showed good performance in the analysis of high confidence peptides, i.e., those identified by at least three algorithms. We also analyzed a yeast data set containing 106,133 spectra downloaded from the NCBI Peptidome repository and got similar results. The results demonstrate that MassWiz is an effective algorithm for high-confidence peptide identification without compromising on the number of assignments. MassWiz is open-source, versatile, and easily configurable.  相似文献   

10.
Summary The peptide sequential assignment algorithm presented here was implemented as a macro within the CONnectivity TRacing ASsignment Tools (CONTRAST) computer software package. The algorithm provides a semi- or fully automated global means of sequentially assigning the NMR backbone resonances of proteins. The program's performance is demonstrated here by its analysis of realistic computer-generated data for IIIGlc, a 168-residue signal-transducing protein of Escherichia coli [Pelton et al. (1991) Biochemistry, 30, 10043–10057]. Missing experimental data (19 resonances) were generated so that a complete assignment set could be tested. The algorithm produces sequential assignments from appropriate peak lists of nD NMR data. It quantifies the ambiguity of each assignment and provides ranked alternatives. A best first approach, in which high-scoring local assignments are made before and in preference to lower scoring assignments, is shown to be superior (in terms of the current set of CONTRAST scoring routines) to approaches such as simulated annealing that seek to maximize the combined scores of the individual assignments. The robustness of the algorithm was tested by evaluating the effects of imposed frequency imprecision (scatter), added false signals (noise), missing peaks (incomplete data), and variation in userdefined tolerances on the performance of the algorithm.  相似文献   

11.
DBParser: web-based software for shotgun proteomic data analyses   总被引:1,自引:0,他引:1  
We describe a web-based program called 'DBParser' for rapidly culling, merging, and comparing sequence search engine results from multiple LC-MS/MS peptide analyses. DBParser employs the principle of parsimony to consolidate redundant protein assignments and derive the most concise set of proteins consistent with all of the assigned peptide sequences observed in an experiment or series of experiments. The resulting reports summarize peptide and protein identifications from multidimensional experiments that may contain a single data set or combine data from a group of data sets, all related to a single analytical sample. Additionally, the results of multiple experiments, each of which may contain several data sets, can be compared in reports that identify features that are common or different. DBParser actively links to the primary mass spectral data and to public online databases such as NCBI, GO, and Swiss-Prot in order to structure contextually specific reports for biologists and biochemists.  相似文献   

12.
This study is an attempt to develop a simple search method for lead peptide candidates, which include constrained structures in a recognized sequence, using the design of a competitive inhibitor for HMG-CoA reductase (HMGR). A structure-functional analysis of previously synthesized peptides proposes that a competitive inhibitory peptide can be designed by maintaining bioactive conformation in a recognized sequence. A conformational aspect of the structure-based approach was applied to the peptide design. By analysis of the projections obtained through a principle component analysis (PCA) for short linear and cyclic peptides, a head-to-tail peptide cycle is considered as a model for its linear analogy. It is proposed that activities of the linear peptides based on an identical amino acid sequence, which are obtained from a less flexible peptide cycle, would be relatively higher than those obtained from more flexible cyclic peptides. The design criterion was formulated in terms of a 'V' parameter, reflecting a relative deviation of an individual peptide cycle from an average statistical peptide cycle based on all optimized structures of the cyclic peptides in set. Twelve peptide cycles were selected for the peptide library. Comparing the calculated 'V' parameters, two cyclic peptides (GLPTGG and GFPTGG) were selected as lead cycles from the library. Based on these sequences, six linear peptides obtained by breaking the cycle at different positions were selected as lead peptide candidates. The linear GFPTGG peptide, showing the highest inhibitory activity against HMGR, increases the inhibitory potency nearly tenfold. Kinetic analysis reveals that the GFPTGG peptide is a competitive inhibitor of HMG-CoA with an equilibrium constant of inhibitor binding (K(i)) of 6.4 +/- 0.3 microM. Conformational data support a conformation of the designed peptides close to the bioactive conformation of the previously synthesized active peptides.  相似文献   

13.
一种有效的重复序列识别算法   总被引:1,自引:0,他引:1  
李冬冬  王正志  倪青山 《生物信息学》2005,3(4):163-166,174
重复序列的分析是基因组研究中的一个重要课题,进行这一研究的基础则是从基因组序列中快速有效地找出其中的重复序列。一种投影拼接算法,即利用随机投影获得候选片断集合,利用片断拼接对候选片断进行拼接,以发现基因组中的重复序列。分析了算法的计算复杂度,构造了半仿真测试数据,对算法的测试结果表明了其有效性。  相似文献   

14.
We recently developed a method for producing comprehensive gene and species phylogenies from unaligned whole genome data using singular value decomposition (SVD) to analyze character string frequencies. This work provides an integrated gene and species phylogeny for 64 vertebrate mitochondrial genomes composed of 832 total proteins. In addition, to provide a theoretical basis for the method, we present a graphical interpretation of both the original frequency matrix and the SVD-derived matrix. These large matrices describe high-dimensional Euclidean spaces within which biomolecular sequences can be uniquely represented as vectors. In particular, the SVD-derived vector space describes each protein relative to a restricted set of newly defined, independent axes, each of which represents a novel form of conserved motif, termed a correlated peptide motif. A quantitative comparison of the relative orientations of protein vectors in this space provides accurate and straightforward estimates of sequence similarity, which can in turn be used to produce comprehensive gene trees. Alternatively, the vector representations of genes from individual species can be summed, allowing species trees to be produced.  相似文献   

15.
We present MassSieve, a Java‐based platform for visualization and parsimony analysis of single and comparative LC‐MS/MS database search engine results. The success of mass spectrometric peptide sequence assignment algorithms has led to the need for a tool to merge and evaluate the increasing data set sizes that result from LC‐MS/MS‐based shotgun proteomic experiments. MassSieve supports reports from multiple search engines with differing search characteristics, which can increase peptide sequence coverage and/or identify conflicting or ambiguous spectral assignments.  相似文献   

16.
17.
Several methods have been used to identify peptides that correspond to tandem mass spectra. In this work, we describe a data set of low energy tandem mass spectra generated from a control mixture of known protein components that can be used to evaluate the accuracy of these methods. As an example, these spectra were searched by the SEQUEST application against a human peptide sequence database. The numbers of resulting correct and incorrect peptide assignments were then determined. We show how the sensitivity and error rate are affected by the use of various filtering criteria based upon SEQUEST scores and the number of tryptic termini of assigned peptides.  相似文献   

18.
A palindrome is a set of characters that reads the same forwards and backwards. Since the discovery of palindromic peptide sequences two decades ago, little effort has been made to understand its structural, functional and evolutionary significance. Therefore, in view of this, an algorithm has been developed to identify all perfect palindromes (excluding the palindromic subset and tandem repeats) in a single protein sequence. The proposed algorithm does not impose any restriction on the number of residues to be given in the input sequence. This avant-garde algorithm will aid in the identification of palindromic peptide sequences of varying lengths in a single protein sequence.  相似文献   

19.
We propose a method for a posteriori evaluation of classification stability which compares the classification of sites in the original data set (a matrix of species by sites) with classifications of subsets of its sites created by without‐replacement bootstrap resampling. Site assignments to clusters of the original classification and to clusters of the classification of each subset are compared using Goodman‐Kruskal's lambda index. Many resampled subsets are classified and the mean of lambda values calculated for the classifications of these subsets is used as an estimation of classification stability. Furthermore, the mean of the lambda values based on different resampled subsets, calculated for each site of the data set separately, can be used as a measure of the influence of particular sites on classification stability. This method was tested on several artificial data sets classified by commonly used clustering methods and on a real data set of forest vegetation plots. Its strength lies in the ability to distinguish classifications which reflect robust patterns of community differentiation from unstable classifications of more continuous patterns. In addition, it can identify sites within each cluster which have a transitional species composition with respect to other clusters.  相似文献   

20.
Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron''s Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号