首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Hernandez P  Gras R  Frey J  Appel RD 《Proteomics》2003,3(6):870-878
In recent years, proteomics research has gained importance due to increasingly powerful techniques in protein purification, mass spectrometry and identification, and due to the development of extensive protein and DNA databases from various organisms. Nevertheless, current identification methods from spectrometric data have difficulties in handling modifications or mutations in the source peptide. Moreover, they have low performance when run on large databases (such as genomic databases), or with low quality data, for example due to bad calibration or low fragmentation of the source peptide. We present a new algorithm dedicated to automated protein identification from tandem mass spectrometry (MS/MS) data by searching a peptide sequence database. Our identification approach shows promising properties for solving the specific difficulties enumerated above. It consists of matching theoretical peptide sequences issued from a database with a structured representation of the source MS/MS spectrum. The representation is similar to the spectrum graphs commonly used by de novo sequencing software. The identification process involves the parsing of the graph in order to emphasize relevant sections for each theoretical sequence, and leads to a list of peptides ranked by a correlation score. The parsing of the graph, which can be a highly combinatorial task, is performed by a bio-inspired algorithm called Ant Colony Optimization algorithm.  相似文献   

2.
3.
Electron capture dissociation (ECD) and infrared multiphoton dissociation (IRMPD) present complementary techniques for the fragmentation of peptides and proteins in Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) in addition to the commonly used collisionally activated dissociation (CAD). Both IRMPD and ECD have been shown to be applicable for an efficient sequencing of peptides and proteins, whereas ECD has proven especially valuable for mapping labile posttranslational modifications (PTMs), such as phosphorylations. In this work, we compare the different fragmentation techniques and MS detection in a linear ion trap and the ICR cell with respect to their abilities to efficiently identify and characterize phosphorylated peptides. For optimizing fragmentation parameters, sets of synthetic peptides with molecular weights ranging from approximately 1 to 4 kDa and different levels of phosphorylation were analyzed. The influence of spectrum averaging for obtaining high-quality spectra was investigated. Our results show that the fragmentation methods CAD and ECD allow for a facilitated analysis of phosphopeptides; however, their general applicability for analyzing phosphopeptides has to be evaluated in each specific case with respect to the given analytical task. The major advantage of complementary peptide cleavages by combining different fragmentation methods is the increased amount of information that is obtained during MS/MS analysis of modified peptides. On the basis of the obtained results, we are planning to design LC time-scale compatible, data-dependent MS/MS methods using the different fragmentation techniques in order to improve the identification and characterization of phosphopeptides.  相似文献   

4.
Barsnes H  Eidhammer I  Martens L 《Proteomics》2011,11(6):1181-1188
Understanding the fragmentation process in MS/MS experiments is vital when trying to validate the results of such experiments, and one way of improving our understanding is to analyze existing data. We here present our findings from an analysis of a large and diverse data set of MS/MS-based peptide identifications, in which each peptide has been identified from multiple spectra, recorded on two commonly used types of electrospray instruments. By analyzing these data we were able to study fragmentation variability on three levels: (i) variation in detection rates and intensities for fragment ions from the same peptide sequence measured multiple times on a single instrument; (ii) consistency of rank-based fragmentation patterns; and (iii) a set of general observations on fragment ion occurrence in MS/MS experiments, regardless of sequence. Our results confirm that substantial variation can be found at all levels, even when high-quality identifications are used and the experimental conditions as well as the peptide sequences are kept constant. Finally, we discuss the observed variability in light of ongoing efforts to create spectral libraries and predictive software for target selection in targeted proteomics.  相似文献   

5.
Chor B  Snir S 《Systematic biology》2004,53(6):963-967
Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees, but finding the global optimum is a hard computational task. Because no general analytic solution is known, numeric techniques such as hill climbing or expectation maximization (EM) are used in order to find optimal parameters for a given tree. So far, analytic solutions were derived only for the simplest model-three-taxa, two-state characters, under a molecular clock. Quoting Ziheng Yang, who initiated the analytic approach,"this seems to be the simplest case, but has many of the conceptual and statistical complexities involved in phylogenetic estimation."In this work, we give general analytic solutions for a family of trees with four-taxa, two-state characters, under a molecular clock. The change from three to four taxa incurs a major increase in the complexity of the underlying algebraic system, and requires novel techniques and approaches. We start by presenting the general maximum likelihood problem on phylogenetic trees as a constrained optimization problem, and the resulting system of polynomial equations. In full generality, it is infeasible to solve this system, therefore specialized tools for the molecular clock case are developed. Four-taxa rooted trees have two topologies-the fork (two subtrees with two leaves each) and the comb (one subtree with three leaves, the other with a single leaf). We combine the ultrametric properties of molecular clock fork trees with the Hadamard conjugation to derive a number of topology dependent identities. Employing these identities, we substantially simplify the system of polynomial equations for the fork. We finally employ symbolic algebra software to obtain closed formanalytic solutions (expressed parametrically in the input data). In general, four-taxa trees can have multiple ML points. In contrast, we can now prove that each fork topology has a unique(local and global) ML point.  相似文献   

6.

Introduction

Tandem mass spectrometry (MS/MS) has been widely used for identifying metabolites in many areas. However, computationally identifying metabolites from MS/MS data is challenging due to the unknown of fragmentation rules, which determine the precedence of chemical bond dissociation. Although this problem has been tackled by different ways, the lack of computational tools to flexibly represent adjacent structures of chemical bonds is still a long-term bottleneck for studying fragmentation rules.

Objectives

This study aimed to develop computational methods for investigating fragmentation rules by analyzing annotated MS/MS data.

Methods

We implemented a computational platform, MIDAS-G, for investigating fragmentation rules. MIDAS-G processes a metabolite as a simple graph and uses graph grammars to recognize specific chemical bonds and their adjacent structures. We can apply MIDAS-G to investigate fragmentation rules by adjusting bond weights in the scoring model of the metabolite identification tool and comparing metabolite identification performances.

Results

We used MIDAS-G to investigate four bond types on real annotated MS/MS data in experiments. The experimental results matched data collected from wet labs and literature. The effectiveness of MIDAS-G was confirmed.

Conclusion

We developed a computational platform for investigating fragmentation rules of tandem mass spectrometry. This platform is freely available for download.
  相似文献   

7.
This work deals with symbolic mathematical solutions to maximum likelihood on small phylogenetic trees. Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees, but finding the global optimum is a hard computational task. In this work, we give general analytic solutions for a family of trees with four taxa, two state characters, under a molecular clock. Previously, analytical solutions were known only for three taxa trees. The change from three to four taxa incurs a major increase in the complexity of the underlying algebraic system, and requires novel techniques and approaches. Despite the simplicity of our model, solving ML analytically in it is close to the limit of today's tractability. Four taxa rooted trees have two topologies--the fork (two subtrees with two leaves each) and the comb (one subtree with three leaves, the other with a single leaf). Combining the properties of molecular clock fork trees with the Hadamard conjugation, and employing the symbolic algebra software Maple, we derive a number of topology dependent identities. Using these identities, we substantially simplify the system of polynomial equations for the fork. We finally employ the symbolic algebra software to obtain closed form analytic solutions (expressed parametrically in the input data).  相似文献   

8.
Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees, but finding the global optimum is a hard computational task. Because no general analytic solution is known, numeric techniques such as hill climbing or expectation maximization (EM), are used in order to find optimal parameters for a given tree. So far, analytic solutions were derived only for the simplest model--three taxa, two state characters, under a molecular clock. Four taxa rooted trees have two topologies--the fork (two subtrees with two leaves each) and the comb (one subtree with three leaves, the other with a single leaf). In a previous work, we devised a closed form analytic solution for the ML molecular clock fork. In this work, we extend the state of the art in the area of analytic solutions ML trees to the family of all four taxa trees under the molecular clock assumption. The change from the fork topology to the comb incurs a major increase in the complexity of the underlying algebraic system and requires novel techniques and approaches. We combine the ultrametric properties of molecular clock trees with the Hadamard conjugation to derive a number of topology dependent identities. Employing these identities, we substantially simplify the system of polynomial equations. We finally use tools from algebraic geometry (e.g., Gr?bner bases, ideal saturation, resultants) and employ symbolic algebra software to obtain analytic solutions for the comb. We show that in contrast to the fork, the comb has no closed form solutions (expressed by radicals in the input data). In general, four taxa trees can have multiple ML points. In contrast, we can now prove that under the molecular clock assumption, the comb has a unique (local and global) ML point. (Such uniqueness was previously shown for the fork.).  相似文献   

9.
A comprehensive phylogeny of papilionoid legumes was inferred from sequences of 2228 taxa in GenBank release 147. A semiautomated analysis pipeline was constructed to download, parse, assemble, align, combine, and build trees from a pool of 11,881 sequences. Initial steps included all-against-all BLAST similarity searches coupled with assembly, using a novel strategy for building length-homogeneous primary sequence clusters. This was followed by a combination of global and local alignment protocols to build larger secondary clusters of locally aligned sequences, thus taking into account the dramatic differences in length of the heterogeneous coding and noncoding sequence data present in GenBank. Next, clusters were checked for the presence of duplicate genes and other potentially misleading sequences and examined for combinability with other clusters on the basis of taxon overlap. Finally, two supermatrices were constructed: a "sparse" matrix based on the primary clusters alone (1794 taxa x 53,977 characters), and a somewhat more "dense" matrix based on the secondary clusters (2228 taxa x 33,168 characters). Both matrices were very sparse, with 95% of their cells containing gaps or question marks. These were subjected to extensive heuristic parsimony analyses using deterministic and stochastic heuristics, including bootstrap analyses. A "reduced consensus" bootstrap analysis was also performed to detect cryptic signal in a subtree of the data set corresponding to a "backbone" phylogeny proposed in previous studies. Overall, the dense supermatrix appeared to provide much more satisfying results, indicated by better resolution of the bootstrap tree, excellent agreement with the backbone papilionoid tree in the reduced bootstrap consensus analysis, few problematic large polytomies in the strict consensus, and less fragmentation of conventionally recognized genera. Nevertheless, at lower taxonomic levels several problems were identified and diagnosed. A large number of methodological issues in supermatrix construction at this scale are discussed, including detection of annotation errors in GenBank sequences; the shortage of effective algorithms and software for local multiple sequence alignment; the difficulty of overcoming effects of fragmentation of data into nearly disjoint blocks in sparse supermatrices; and the lack of informative tools to assess confidence limits in very large trees.  相似文献   

10.
SUMMARY: QDist is a program for computing the quartet distance between two unrooted trees, i.e. the number of quartet topology differences between the trees, where a quartet topology is the topological subtree induced by four species. The program is based on an algorithm with running time O(n log2 n), which makes it practical to compare large trees. Available under GNU license. AVAILABILITY: http://www.birc.dk/Software/QDist  相似文献   

11.
The goal of metabolic flux analysis (MFA) is the accurate estimation of intracellular fluxes in metabolic networks. Here, we introduce a new method for MFA based on tandem mass spectrometry (MS) and stable-isotope tracer experiments. We demonstrate that tandem MS provides more labeling information than can be obtained from traditional full scan MS analysis and allows estimation of fluxes with better precision. We present a modeling framework that takes full advantage of the additional labeling information obtained from tandem MS for MFA. We show that tandem MS data can be computed for any network model, any compound and any tandem MS fragmentation using linear mapping of isotopomers. The inherent advantages of tandem MS were illustrated in two network models using simulated and literature data. Application of tandem MS increased the observability of the models and improved the precision of estimated fluxes by 2- to 5-fold compared to traditional MS analysis.  相似文献   

12.
The Cavender-Felsenstein edge-length invariants for binary characters on 4-trees provide the starting point for the development of "customized" invariants for evaluating and comparing phylogenetic hypotheses. The binary character invariants may be generalized to k-valued characters without losing the quadratic nature of the invariants as functions of the theoretical frequencies f(UVXY) of observable character configurations (U at organism 1, V at 2, etc.). The key to the approach is that certain sets of these configurations constitute events which are probabilistically independent from other such sets, under the symmetric Markov change models studied. By introducing more complex sets of configurations, we find the quadratic invariants for 5-trees in the binary model and for individual edges in 6-trees or, indeed, in any size tree. The same technique allows us to formulate invariants for entire trees, but these are cubic functions for 6-trees and are higher-degree polynomials for larger trees. With k-valued characters and, especially, with large trees, the types of configuration sets (events) used in the simpler examples are too rare (i.e., their predicted frequencies are too low) to be useful, and the construction of meaningful pairs of independent events becomes an important and nontrivial task in designing invariants suited to testing specific hypotheses. In a very natural way, this approach fits in with well-known statistical methodology for contingency tables. We explore use of events such as "only transitions occur for character i (i.e., position i in a nucleic acid sequence) in subtree a" in analyzing a set of data on ribosomal RNA in the context of the controversy over the origins of archaebacteria, eubacteria, and eukaryotes.  相似文献   

13.
A Maximum Agreement SubTree (MAST) is a largest subtree common to a set of trees and serves as a summary of common substructure in the trees. A single MAST can be misleading, however, since there can be an exponential number of MASTs, and two MASTs for the same tree set do not even necessarily share any leaves. In this paper, we introduce the notion of the Kernel Agreement SubTree (KAST), which is the summary of the common substructure in all MASTs, and show that it can be calculated in polynomial time (for trees with bounded degree). Suppose the input trees represent competing hypotheses for a particular phylogeny. We explore the utility of the KAST as a method to discern the common structure of confidence, and as a measure of how confident we are in a given tree set. We also show the trend of the KAST, as compared to other consensus methods, on the set of all trees visited during a Bayesian analysis of flatworm genomes.  相似文献   

14.
The "neighbor-joining algorithm" is a recursive procedure for reconstructing trees that is based on a transformation of pairwise distances between leaves. We present a generalization of the neighbor-joining transformation, which uses estimates of phylogenetic diversity rather than pairwise distances in the tree. This leads to an improved neighbor-joining algorithm whose total running time is still polynomial in the number of taxa. On simulated data, the method outperforms other distance-based methods. We have implemented neighbor-joining for subtree weights in a program called MJOIN which is freely available under the Gnu Public License at http://bio.math.berkeley.edu/mjoin/.  相似文献   

15.
The identification of large series of metabolites detectable by mass spectrometry (MS) in crude extracts is a challenging task. In order to test and apply the so-called multistage mass spectrometry (MS n ) spectral tree approach as tool in metabolite identification in complex sample extracts, we firstly performed liquid chromatography (LC) with online electrospray ionization (ESI)?CMS n , using crude extracts from both tomato fruit and Arabidopsis leaf. Secondly, the extracts were automatically fractionated by a NanoMate LC-fraction collector/injection robot (Advion) and selected LC-fractions were subsequently analyzed using nanospray-direct infusion to generate offline in-depth MS n spectral trees at high mass resolution. Characterization and subsequent annotation of metabolites was achieved by detailed analysis of the MS n spectral trees, thereby focusing on two major plant secondary metabolite classes: phenolics and glucosinolates. Following this approach, we were able to discriminate all selected flavonoid glycosides, based on their unique MS n fragmentation patterns in either negative or positive ionization mode. As a proof of principle, we report here 127 annotated metabolites in the tomato and Arabidopsis extracts, including 21 novel metabolites. Our results indicate that online LC?CMS n fragmentation in combination with databases of in-depth spectral trees generated offline can provide a fast and reliable characterization and annotation of metabolites present in complex crude extracts such as those from plants.  相似文献   

16.
We present a protocol for the identification of glycosylated proteins in plasma followed by elucidation of their individual glycan compositions. The study of glycoproteins by mass spectrometry is usually based on cleavage of glycans followed by separate analysis of glycans and deglycosylated proteins, which limits the ability to derive glycan compositions for individual glycoproteins. The methodology described here consists of 2D HPLC fractionation of intact proteins and liquid chromatography-multistage tandem mass spectrometry (LC-MS/MS(n)) analysis of digested protein fractions. Protein samples are separated by 1D anion-exchange chromatography (AEX) with an eight-step salt elution. Protein fractions from each of the eight AEX elution steps are transferred onto the 2D reversed-phase column to further separate proteins. A digital ion trap mass spectrometer with a wide mass range is then used for LC-MS/MS(n) analysis of intact glycopeptides from the 2D HPLC fractions. Both peptide and oligosaccharide compositions are revealed by analysis of the ion fragmentation patterns of glycopeptides with an intact glycopeptide analysis pipeline.  相似文献   

17.
Mass spectrometry (MS) analysis of peptides carrying post‐translational modifications is challenging due to the instability of some modifications during MS analysis. However, glycopeptides as well as acetylated, methylated and other modified peptides release specific fragment ions during CID (collision‐induced dissociation) and HCD (higher energy collisional dissociation) fragmentation. These fragment ions can be used to validate the presence of the PTM on the peptide. Here, we present PTM MarkerFinder, a software tool that takes advantage of such marker ions. PTM MarkerFinder screens the MS/MS spectra in the output of a database search (i.e., Mascot) for marker ions specific for selected PTMs. Moreover, it reports and annotates the HCD and the corresponding electron transfer dissociation (ETD) spectrum (when present), and summarizes information on the type, number, and ratios of marker ions found in the data set. In the present work, a sample containing enriched N‐acetylhexosamine (HexNAc) glycopeptides from yeast has been analyzed by liquid chromatography‐mass spectrometry on an LTQ Orbitrap Velos using both HCD and ETD fragmentation techniques. The identification result (Mascot .dat file) was submitted as input to PTM MarkerFinder and screened for HexNAc oxonium ions. The software output has been used for high‐throughput validation of the identification results.  相似文献   

18.
19.
《MABS-AUSTIN》2013,5(8):1351-1357
ABSTRACT

The analysis of monoclonal antibodies (mAbs) by a middle-down mass spectrometry (MS) approach is a growing field that attracts the attention of many researchers and biopharmaceutical companies. Usually, liquid fractionation techniques are used to separate mAbs polypeptides chains before MS analysis. Gas-phase fractionation techniques such as high-field asymmetric waveform ion mobility spectrometry (FAIMS) can replace liquid-based separations and reduce both analysis time and cost. Here, we present a rapid FAIMS tandem MS method capable of characterizing the polypeptide sequence of mAbs light and heavy chains in an unprecedented, easy, and fast fashion. This new method uses commercially available instruments and takes ~24 min, which is 40-60% faster than regular liquid chromatography-MS/MS analysis, to acquire fragmentation data using different dissociation methods.  相似文献   

20.
A thorough understanding of the fragmentation processes in MS/MS can be a powerful tool in assessing the resulting peptide and protein identifications. We here present the freely available, open‐source FragmentationAnalyzer tool ( http://fragmentation‐analyzer.googlecode.com ) that makes it straightforward to analyze large MS/MS data sets for specific types of identified peptides, using a common set of peptide properties. This enables the detection of fragmentation pattern nuances related to specific instruments or due to the presence of post‐translational modifications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号