首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Hernandez P  Gras R  Frey J  Appel RD 《Proteomics》2003,3(6):870-878
In recent years, proteomics research has gained importance due to increasingly powerful techniques in protein purification, mass spectrometry and identification, and due to the development of extensive protein and DNA databases from various organisms. Nevertheless, current identification methods from spectrometric data have difficulties in handling modifications or mutations in the source peptide. Moreover, they have low performance when run on large databases (such as genomic databases), or with low quality data, for example due to bad calibration or low fragmentation of the source peptide. We present a new algorithm dedicated to automated protein identification from tandem mass spectrometry (MS/MS) data by searching a peptide sequence database. Our identification approach shows promising properties for solving the specific difficulties enumerated above. It consists of matching theoretical peptide sequences issued from a database with a structured representation of the source MS/MS spectrum. The representation is similar to the spectrum graphs commonly used by de novo sequencing software. The identification process involves the parsing of the graph in order to emphasize relevant sections for each theoretical sequence, and leads to a list of peptides ranked by a correlation score. The parsing of the graph, which can be a highly combinatorial task, is performed by a bio-inspired algorithm called Ant Colony Optimization algorithm.  相似文献   

2.
A graph-theory algorithm for rapid protein side-chain prediction   总被引:19,自引:0,他引:19       下载免费PDF全文
Fast and accurate side-chain conformation prediction is important for homology modeling, ab initio protein structure prediction, and protein design applications. Many methods have been presented, although only a few computer programs are publicly available. The SCWRL program is one such method and is widely used because of its speed, accuracy, and ease of use. A new algorithm for SCWRL is presented that uses results from graph theory to solve the combinatorial problem encountered in the side-chain prediction problem. In this method, side chains are represented as vertices in an undirected graph. Any two residues that have rotamers with nonzero interaction energies are considered to have an edge in the graph. The resulting graph can be partitioned into connected subgraphs with no edges between them. These subgraphs can in turn be broken into biconnected components, which are graphs that cannot be disconnected by removal of a single vertex. The combinatorial problem is reduced to finding the minimum energy of these small biconnected components and combining the results to identify the global minimum energy conformation. This algorithm is able to complete predictions on a set of 180 proteins with 34342 side chains in <7 min of computer time. The total chi(1) and chi(1 + 2) dihedral angle accuracies are 82.6% and 73.7% using a simple energy function based on the backbone-dependent rotamer library and a linear repulsive steric energy. The new algorithm will allow for use of SCWRL in more demanding applications such as sequence design and ab initio structure prediction, as well addition of a more complex energy function and conformational flexibility, leading to increased accuracy.  相似文献   

3.
Mass spectrometry‐based proteomics is a popular and powerful method for precise and highly multiplexed protein identification. The most common method of analyzing untargeted proteomics data is called database searching, where the database is simply a collection of protein sequences from the target organism, derived from genome sequencing. Experimental peptide tandem mass spectra are compared to simplified models of theoretical spectra calculated from the translated genomic sequences. However, in several interesting application areas, such as forensics, archaeology, venomics, and others, a genome sequence may not be available, or the correct genome sequence to use is not known. In these cases, de novo peptide identification can play an important role. De novo methods infer peptide sequence directly from the tandem mass spectrum without reference to a sequence database, usually using graph‐based or machine learning algorithms. In this review, we provide a basic overview of de novo peptide identification methods and applications, briefly covering de novo algorithms and tools, and focusing in more depth on recent applications from venomics, metaproteomics, forensics, and characterization of antibody drugs.  相似文献   

4.
"Protein Side-chain Packing" has an ever-increasing application in the field of bio-informatics, dating from the early methods of homology modeling to protein design and to the protein docking. However, this problem is computationally known to be NP-hard. In this regard, we have developed a novel approach to solve this problem using the notion of a maximum edge-weight clique. Our approach is based on efficient reduction of protein side-chain packing problem to a graph and then solving the reduced graph to find the maximum clique by applying an efficient clique finding algorithm developed by our co-authors. Since our approach is based on deterministic algorithms in contrast to the various existing algorithms based on heuristic approaches, our algorithm guarantees of finding an optimal solution. We have tested this approach to predict the side-chain conformations of a set of proteins and have compared the results with other existing methods. We have found that our results are favorably comparable or better than the results produced by the existing methods. As our test set contains a protein of 494 residues, we have obtained considerable improvement in terms of size of the proteins and in terms of the efficiency and the accuracy of prediction.  相似文献   

5.
Sistla RK  K V B  Vishveshwara S 《Proteins》2005,59(3):616-626
We present a novel method for the identification of structural domains and domain interface residues in proteins by graph spectral method. This method converts the three-dimensional structure of the protein into a graph by using atomic coordinates from the PDB file. Domain definitions are obtained by constructing either a protein backbone graph or a protein side-chain graph. The graph is constructed based on the interactions between amino acid residues in the three-dimensional structure of the proteins. The spectral parameters of such a graph contain information regarding the domains and subdomains in the protein structure. This is based on the fact that the interactions among amino acids are higher within a domain than across domains. This is evident in the spectra of the protein backbone and the side-chain graphs, thus differentiating the structural domains from one another. Further, residues that occur at the interface of two domains can also be easily identified from the spectra. This method is simple, elegant, and robust. Moreover, a single numeric computation yields both the domain definitions and the interface residues.  相似文献   

6.
Protein structure prediction   总被引:2,自引:0,他引:2  
The prediction of protein structure, based primarily on sequence and structure homology, has become an increasingly important activity. Homology models have become more accurate and their range of applicability has increased. Progress has come, in part, from the flood of sequence and structure information that has appeared over the past few years, and also from improvements in analysis tools. These include profile methods for sequence searches, the use of three-dimensional structure information in sequence alignment and new homology modeling tools, specifically in the prediction of loop and side-chain conformations. There have also been important advances in understanding the physical chemical basis of protein stability and the corresponding use of physical chemical potential functions to identify correctly folded from incorrectly folded protein conformations.  相似文献   

7.
De novo peptide sequencing by mass spectrometry (MS) can determine the amino acid sequence of an unknown peptide without reference to a protein database. MS-based de novo sequencing assumes special importance in focused studies of families of biologically active peptides and proteins, such as hormones, toxins, and antibodies, for which amino acid sequences may be difficult to obtain through genomic methods. These protein families often exhibit sequence homology or characteristic amino acid content; yet, current de novo sequencing approaches do not take advantage of this prior knowledge and, hence, search an unnecessarily large space of possible sequences. Here, we describe an algorithm for de novo sequencing that incorporates sequence constraints into the core graph algorithm and thereby reduces the search space by many orders of magnitude. We demonstrate our algorithm in a study of cysteine-rich toxins from two cone snail species (Conus textile and Conus stercusmuscarum) and report 13 de novo and about 60 total toxins.  相似文献   

8.
This tutorial article introduces mass spectrometry (MS) for peptide fragmentation and protein identification. The current approaches being used for protein identification include top-down and bottom-up sequencing. Top-down sequencing, a relatively new approach that involves fragmenting intact proteins directly, is briefly introduced. Bottom-up sequencing, a traditional approach that fragments peptides in the gas phase after protein digestion, is discussed in more detail. The most widely used ion activation and dissociation process, gas-phase collision-activated dissociation (CAD), is discussed from a practical point of view. Infrared multiphoton dissociation (IRMPD) and electron capture dissociation (ECD) are introduced as two alternative dissociation methods. For spectral interpretation, the common fragment ion types in peptide fragmentation and their structures are introduced; the influence of instrumental methods on the fragmentation pathways and final spectra are discussed. A discussion is also provided on the complications in sample preparation for MS analysis. The final section of this article provides a brief review of recent research efforts on different algorithmic approaches being developed to improve protein identification searches.  相似文献   

9.

Background  

Homology is a key concept in both evolutionary biology and genomics. Detection of homology is crucial in fields like the functional annotation of protein sequences and the identification of taxon specific genes. Basic homology searches are still frequently performed by pairwise search methods such as BLAST. Vast improvements have been made in the identification of homologous proteins by using more advanced methods that use sequence profiles. However additional improvement could be made by exploiting sources of genomic information other than the primary sequence or tertiary structure.  相似文献   

10.
This paper presents a novel method to detect side-chain clusters in protein three-dimensional structures using a graph spectral approach. Protein side-chain interactions are represented by a labeled graph in which the nodes of the graph represent the Cbeta atoms and the edges represent the distance between the Cbeta atoms. The distance information and the non-bonded connectivity of the residues are represented in the form of a matrix called the Laplacian matrix. The constructed matrix is diagonalized and clustering information is obtained from the vector components associated with the second lowest eigenvalue and cluster centers are obtained from the vector components associated with the top eigenvalues. The method uses global information for clustering and a single numeric computation is required to detect clusters of interest. The approach has been adopted here to detect a variety of side-chain clusters and identify the residue which makes the largest number of interactions among the residues forming the cluster (cluster centers). Detecting such clusters and cluster centers are important from a protein structure and folding point of view. The crucial residues which are important in the folding pathway as determined by PhiF values (which is a measure of the effect of a mutation on the stability of the transition state of folding) as obtained from protein engineering methods, can be identified from the vector components corresponding to the top eigenvalues. Expanded clusters are detected near the active and binding site of the protein, supporting the nucleation condensation hypothesis for folding. The method is also shown to detect domains in protein structures and conserved side-chain clusters in topologically similar proteins.  相似文献   

11.
Protein identification has been greatly facilitated by database searches against protein sequences derived from product ion spectra of peptides. This approach is primarily based on the use of fragment ion mass information contained in a MS/MS spectrum. Unambiguous protein identification from a spectrum with low sequence coverage or poor spectral quality can be a major challenge. We present a two-dimensional (2D) mass spectrometric method in which the numbers of nitrogen atoms in the molecular ion and the fragment ions are used to provide additional discriminating power for much improved protein identification and de novo peptide sequencing. The nitrogen number is determined by analyzing the mass difference of corresponding peak pairs in overlaid spectra of (15)N-labeled and unlabeled peptides. These peptides are produced by enzymatic or chemical cleavage of proteins from cells grown in (15)N-enriched and normal media, respectively. It is demonstrated that, using 2D information, i.e., m/z and its associated nitrogen number, this method can, not only confirm protein identification results generated by MS/MS database searching, but also identify peptides that are not possible to identify by database searching alone. Examples are presented of analyzing Escherichia coli K12 extracts that yielded relatively poor MS/MS spectra, presumably from the digests of low abundance proteins, which can still give positive protein identification using this method. Additionally, this 2D MS method can facilitate spectral interpretation for de novo peptide sequencing and identification of posttranslational or other chemical modifications. We envision that this method should be particularly useful for proteome expression profiling of organelles or cells that can be grown in (15)N-enriched media.  相似文献   

12.
13.
Chung SY  Subbiah S 《Proteins》1999,35(2):184-194
The precision and accuracy of protein structures determined by nuclear magnetic resonance (NMR) spectroscopy depend on the completeness of input experimental data set. Typically, rather than a single structure, an ensemble of up to 20 equally representative conformers is generated and routinely deposited in the Protein Database. There are substantially more experimentally derived restraints available to define the main-chain coordinates than those of the side chains. Consequently, the side-chain conformations among the conformers are more variable and less well defined than those of the backbone. Even when a side chain is determined with high precision and is found to adopt very similar orientations among all the conformers in the ensemble, it is possible that its orientation might still be incorrect. Thus, it would be helpful if there were a method to assess independently the side-chain orientations determined by NMR. Recently, homology modeling by side-chain packing algorithms has been shown to be successful in predicting the side-chain conformations of the buried residues for a protein when the main-chain coordinates and sequence information are given. Since the main-chain coordinates determined by NMR are consistently more reliable than those of the side-chains, we have applied the side-chain packing algorithms to predict side-chain conformations that are compatible with the NMR-derived backbone. Using four test cases where the NMR solution structures and the X-ray crystal structure of the same protein are available, we demonstrate that the side-chain packing method can provide independent validation for the side-chain conformations of NMR structures. Comparison of the side-chain conformations derived by side-chain packing prediction and by NMR spectroscopy demonstrates that when there is agreement between the NMR model and the predicted model, on average 78% of the time the X-ray structure also concurs. While the side-chain packing method can confirm the reliable residue conformations in NMR models, more importantly, it can also identify the questionable residue conformations with an accuracy of 60%. This validation method can serve to increase the confidence level for potential users of structural models determined by NMR.  相似文献   

14.
We present a method for peptide and protein identification based on LC-MS profiling. The method identified peptides at high-throughput without expending the sequencing time necessary for CID spectra based identification. The measurable peptide properties of mass and liquid chromatographic elution conditions are used to characterize and differentiate peptide features, and these peptide features are matched to a reference database from previously acquired and archived LC-MS/MS experiments to generate sequence assignments. The matches are scored according to the probability of an overlap between the peptide feature and the database peptides resulting in a ranked list of possible peptide sequences for each peptide submitted. This method resulted in 6 times more peptide sequence identifications from a single LC-MS analysis of yeast than from shotgun peptide sequencing using LC-MS/MS.  相似文献   

15.
Performance evaluation of existing de novo sequencing algorithms   总被引:1,自引:0,他引:1  
Two methods have been developed for protein identification from tandem mass spectra: database searching and de novo sequencing. De novo sequencing identifies peptide directly from tandem mass spectra. Among many proposed algorithms, we evaluated the performance of the five de novo sequencing algorithms, AUDENS, Lutefisk, NovoHMM, PepNovo, and PEAKS. Our evaluation methods are based on calculation of relative sequence distance (RSD), algorithm sensitivity, and spectrum quality. We found that de novo sequencing algorithms have different performance in analyzing QSTAR and LCQ mass spectrometer data, but in general, perform better in analyzing QSTAR data than LCQ data. For the QSTAR data, the performance order of the five algorithms is PEAKS > Lutefisk, PepNovo > AUDENS, NovoHMM. The performance of PEAKS, Lutefisk, and PepNovo strongly depends on the spectrum quality and increases with an increase of spectrum quality. However, AUDENS and NovoHMM are not sensitive to the spectrum quality. Compared with other four algorithms, PEAKS has the best sensitivity and also has the best performance in the entire range of spectrum quality. For the LCQ data, the performance order is NovoHMM > PepNovo, PEAKS > Lutefisk > AUDENS. NovoHMM has the best sensitivity, and its performance is the best in the entire range of spectrum quality. But the overall performance of NovoHMM is not significantly different from the performance of PEAKS and PepNovo. AUDENS does not give a good performance in analyzing either QSTAR and LCQ data.  相似文献   

16.
MOTIVATION: Tandem mass spectrometry combined with sequence database searching is one of the most powerful tools for protein identification. As thousands of spectra are generated by a mass spectrometer in one hour, the speed of database searching is critical, especially when searching against a large sequence database, or when the peptide is generated by some unknown or non-specific enzyme, even or when the target peptides have post-translational modifications (PTM). In practice, about 70-90% of the spectra have no match in the database. Many believe that a significant portion of them are due to peptides of non-specific digestions by unknown enzymes or amino acid modifications. In another case, scientists may choose to use some non-specific enzymes such as pepsin or thermolysin for proteolysis in proteomic study, in that not all proteins are amenable to be digested by some site-specific enzymes, and furthermore many digested peptides may not fall within the rang of molecular weight suitable for mass spectrometry analysis. Interpreting mass spectra of these kinds will cost a lot of computational time of database search engines. OVERVIEW: The present study was designed to speed up the database searching process for both cases. More specifically speaking, we employed an approach combining suffix tree data structure and spectrum graph. The suffix tree is used to preprocess the protein sequence database, while the spectrum graph is used to preprocess the tandem mass spectrum. We then search the suffix tree against the spectrum graph for candidate peptides. We design an efficient algorithm to compute a matching threshold with some statistical significance level, e.g. p = 0.01, for each spectrum, and use it to select candidate peptides. Then we rank these peptides using a SEQUEST-like scoring function. The algorithms were implemented and tested on experimental data. For post-translational modifications, we allow arbitrary number of any modification to a protein. AVAILABILITY: The executable program and other supplementary materials are available online at: http://hto-c.usc.edu:8000/msms/suffix/.  相似文献   

17.
Shadforth I  Crowther D  Bessant C 《Proteomics》2005,5(16):4082-4095
Current proteomics experiments can generate vast quantities of data very quickly, but this has not been matched by data analysis capabilities. Although there have been a number of recent reviews covering various aspects of peptide and protein identification methods using MS, comparisons of which methods are either the most appropriate for, or the most effective at, their proposed tasks are not readily available. As the need for high-throughput, automated peptide and protein identification systems increases, the creators of such pipelines need to be able to choose algorithms that are going to perform well both in terms of accuracy and computational efficiency. This article therefore provides a review of the currently available core algorithms for PMF, database searching using MS/MS, sequence tag searches and de novo sequencing. We also assess the relative performances of a number of these algorithms. As there is limited reporting of such information in the literature, we conclude that there is a need for the adoption of a system of standardised reporting on the performance of new peptide and protein identification algorithms, based upon freely available datasets. We go on to present our initial suggestions for the format and content of these datasets.  相似文献   

18.
The problem of constructing all-atom model co-ordinates of a protein from an outline of the polypeptide chain is encountered in protein structure determination by crystallography or nuclear magnetic resonance spectroscopy, in model building by homology and in protein design. Here, we present an automatic procedure for generating full protein co-ordinates (backbone and, optionally, side-chains) given the C alpha trace and amino acid sequence. To construct backbones, a protein structure database is first scanned for fragments that locally fit the chain trace according to distance criteria. A best path algorithm then sifts through these segments and selects an optimal path with minimal mismatch at fragment joints. In blind tests, using fully known protein structures, backbones (C alpha, C, N, O) can be reconstructed with a reliability of 0.4 to 0.6 A root-mean-square position deviation and not more than 0 to 5% peptide flips. This accuracy is sufficient to identify possible errors in protein co-ordinate sets. To construct full co-ordinates, side-chains are added from a library of frequently occurring rotamers using a simple and fast Monte Carlo procedure with simulated annealing. In tests on X-ray structures determined at better than 2.5 A resolution, the positions of side-chain atoms in the protein core (less than 20% relative accessibility) have an accuracy of 1.6 A (r.m.s. deviation) and 70% of chi 1 angles are within 30 degrees of the X-ray structure. The computer program MaxSprout is available on request.  相似文献   

19.
An important but difficult problem in proteomics is the identification of post-translational modifications (PTMs) in a protein. In general, the process of PTM identification by aligning experimental spectra with theoretical spectra from peptides in a peptide database is very time consuming and may lead to high false positive rate. In this paper, we introduce a new approach that is both efficient and effective for blind PTM identification. Our work consists of the following phases. First, we develop a novel tree decomposition based algorithm that can efficiently generate peptide sequence tags (PSTs) from an extended spectrum graph. Sequence tags are selected from all maximum weighted antisymmetric paths in the graph and their reliabilities are evaluated with a score function. An efficient deterministic finite automaton (DFA) based model is then developed to search a peptide database for candidate peptides by using the generated sequence tags. Finally, a point process model-an efficient blind search approach for PTM identification, is applied to report the correct peptide and PTMs if there are any. Our tests on 2657 experimental tandem mass spectra and 2620 experimental spectra with one artificially added PTM show that, in addition to high efficiency, our ab-initio sequence tag selection algorithm achieves better or comparable accuracy to other approaches. Database search results show that the sequence tags of lengths 3 and 4 filter out more than 98.3% and 99.8% peptides respectively when applied to a yeast peptide database. With the dramatically reduced search space, the point process model achieves significant improvement in accuracy as well. AVAILABILITY: The software is available upon request.  相似文献   

20.
The primary structure of Beijing duck apolipoprotein A-1 was determined by sequencing peptide fragments derived from tryptic and endoproteinase Asp-N digestion of the protein, and alignment with homologous chicken apo A-1. All of the peptide fragments were isolated by high-pressure liquid chromatography (HPLC) with a Vydac C18 column using a trifluoroacetic acid (TFA) buffer system. The N-terminus of the protein was determined to be aspartic acid by directly sequencing 52 residues of the intact protein. The C-terminus was alanine. The protein contains 240 amino acid residues. By analysis of the whole protein and its tryptic peptides, a six amino acid (Arg-Tyr-Phe-Trp-Gln-His) prosegment was determined. No cross-reactivity between duck and human apo A-1 with a goat antiserum against human apo A-1 was found. Sequence analysis of apo A-1 of other species indicates that amino acid substitutions in rat are more extensive than in other mammals. Isoleucine residues in apo A-1 are inversely correlated to the homology of human to other species, except dog.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号