首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The accelerated pace of genomic sequencing has increased the demand for structural models of gene products. Improved quantitative methods are needed to study the many systems (e.g., macromolecular assemblies) for which data are scarce. Here, we describe a new molecular dynamics method for protein structure determination and molecular modeling. An energy function, or database potential, is derived from distributions of interatomic distances obtained from a database of known structures. X-ray crystal structures are refined by molecular dynamics with the new energy function replacing the Van der Waals potential. Compared to standard methods, this method improved the atomic positions, interatomic distances, and side-chain dihedral angles of structures randomized to mimic the early stages of refinement. The greatest enhancement in side-chain placement was observed for groups that are characteristically buried. More accurate calculated model phases will follow from improved interatomic distances. Details usually seen only in high-resolution refinements were improved, as is shown by an R-factor analysis. The improvements were greatest when refinements were carried out using X-ray data truncated at 3.5 A. The database potential should therefore be a valuable tool for determining X-ray structures, especially when only low-resolution data are available.  相似文献   

2.
MS‐based proteomics characterizes protein contents of biological samples. The most common approach is to first match observed MS/MS peptide spectra against theoretical spectra from a protein sequence database and then to score these matches. The false discovery rate (FDR) can be estimated as a function of the score by searching together the protein sequence database and its randomized version and comparing the score distributions of the randomized versus nonrandomized matches. This work introduces a straightforward isotonic regression‐based method to estimate the cumulative FDRs and local FDRs (LFDRs) of peptide identification. Our isotonic method not only performed as well as other methods used for comparison, but also has the advantages of being: (i) monotonic in the score, (ii) computationally simple, and (iii) not dependent on assumptions about score distributions. We demonstrate the flexibility of our approach by using it to estimate FDRs and LFDRs for protein identification using summaries of the peptide spectra scores. We reconfirmed that several of these methods were superior to a two‐peptide rule. Finally, by estimating both the FDRs and LFDRs, we showed for both peptide and protein identification, moderate FDR values (5%) corresponded to large LFDR values (53 and 60%).  相似文献   

3.
The SwePep database is designed for endogenous peptides and mass spectrometry. It contains information about the peptides such as mass, pl, precursor protein and potential post-translational modifications. Here, we have improved and extended the SwePep database with tandem mass spectra, by adding a locally curated version of the global proteome machine database (GPMDB). In peptidomic experiment practice, many peptide sequences contain multiple tandem mass spectra with different quality. The new tandem mass spectra database in SwePep enables validation of low quality spectra using high quality tandem mass spectra. The validation is performed by comparing the fragmentation patterns of the two spectra using algorithms for calculating the correlation coefficient between the spectra. The present study is the first step in developing a tandem spectrum database for endogenous peptides that can be used for spectrum-to-spectrum identifications instead of peptide identifications using traditional protein sequence database searches.  相似文献   

4.
5.
In recent years, a variety of approaches have been developed using decoy databases to empirically assess the error associated with peptide identifications from large-scale proteomics experiments. We have developed an approach for calculating the expected uncertainty associated with false-positive rate determination using concatenated reverse and forward protein sequence databases. After explaining the theoretical basis of our model, we compare predicted error with the results of experiments characterizing a series of mixtures containing known proteins. In general, results from characterization of known proteins show good agreement with our predictions. Finally, we consider how these approaches may be applied to more complicated data sets, as when peptides are separated by charge state prior to false-positive determination.  相似文献   

6.
Xue X  Wu S  Wang Z  Zhu Y  He F 《Proteomics》2006,6(23):6134-6145
The calculation of protein probabilities is one of the most intractable problems in large-scale proteomic research. Current available estimating methods, for example, ProteinProphet, PROT_PROBE, Poisson model and two-peptide hits, employ different models trying to resolve this problem. Until now, no efficient method is used for comparative evaluation of the above methods in large-scale datasets. In order to evaluate these various methods, we developed a semi-random sampling model to simulate large-scale proteomic data. In this model, the identified peptides were sampled from the designed proteins and their cross-correlation scores were simulated according to the results from reverse database searching. The simulated result of 18 control proteins was consistent with the experimental one, demonstrating the efficiency of our model. According to the simulated results of human liver sample, ProteinProphet returned slightly higher probabilities and lower specificity than real cases. PROT_PROBE was a more efficient method with higher specificity. Predicted results from a Poisson model roughly coincide with real datasets, and the method of two-peptide hits seems solid but imprecise. However, the probabilities of identified proteins are strongly correlated with several experimental factors including spectra number, database size and protein abundance distribution.  相似文献   

7.
We describe the theoretical basis for a peptide identification method wherein peptides are represented as vectors based on their amino acid composition and grouped into clusters. Unknown peptides are identified by finding the database cluster and peptide entries with the shortest Euclidian distance. We demonstrate that the amino acid composition of peptides is virtually as informative as the sequence and allows rapid peptide identification more accurately than peptide mass alone.  相似文献   

8.
Posterior probabilities for a change-point using ranks   总被引:1,自引:0,他引:1  
PETTITT  A. N. 《Biometrika》1981,68(2):443-450
  相似文献   

9.
Bioactive peptides play critical roles in regulating most biological processes in animals, and have considerable biological, medical and industrial importance. A number of peptides have been discovered usually based on their biological activities in vitro or based on their sequence similarities in silico. Through searches in Swiss-Prot and Trembl protein databases using BLAST alignment tools and other in silico methods, all currently known bioactive peptides and their precursor proteins are extracted. In addition, 132 recently discovered putative peptide genes in Drosophila as well as their orthologs in other species are collected. In total, 20 027 bioactive peptides from 19 438 precursor proteins covering 2820 metazoan species are retained, and they, respectively, make up a peptide and a peptide precursor database. The peptides and peptide precursor proteins are further classified into 373 families, 178 of which are represented by Prosite Pfam or Smart motifs, or by typical peptide motifs that have been constructed recently. The remaining 195 families are novel peptide families. The motifs characterizing the 178 peptide families are saved into a peptide motif database. The peptide, peptide precursor and peptide motif databases (version 1.0) are the most complete peptide, precursor and peptide motif collection in Metazoa so far. They are available on the WWW at http://www.peptides.be/.  相似文献   

10.
11.
We report a hybrid search method combining database and spectral library searches that allows for a straightforward approach to characterizing the error rates from the combined data. Using these methods, we demonstrate significantly increased sensitivity and specificity in matching peptides to tandem mass spectra. The hybrid search method increased the number of spectra that can be assigned to a peptide in a global proteomics study by 57-147% at an estimated false discovery rate of 5%, with clear room for even greater improvements. The approach combines the general utility of using consensus model spectra typical of database search methods with the accuracy of the intensity information contained in spectral libraries. A common scoring metric based on recent developments linking data analysis and statistical thermodynamics is used, which allows the use of a conservative estimate of error rates for the combined data. We applied this approach to proteomics analysis of Synechococcus sp. PCC 7002, a cyanobacterium that is a model organism for studies of photosynthetic carbon fixation and biofuels development. The increased specificity and sensitivity of this approach allowed us to identify many more peptides involved in the processes important for photoautotrophic growth.  相似文献   

12.
Many software tools have been developed for the automated identification of peptides from tandem mass spectra. The accuracy and sensitivity of the identification software via database search are critical for successful proteomics experiments. A new database search tool, PEAKS DB, has been developed by incorporating the de novo sequencing results into the database search. PEAKS DB achieves significantly improved accuracy and sensitivity over two other commonly used software packages. Additionally, a new result validation method, decoy fusion, has been introduced to solve the issue of overconfidence that exists in the conventional target decoy method for certain types of peptide identification software.  相似文献   

13.
Kwon KH  Kim M  Kim JY  Kim KW  Kim SI  Park YM  Yoo JS 《Proteomics》2003,3(12):2305-2309
We compared peptide identification by database (DB) search methods with de novo sequencing results for proteomics study in an organism without genome sequence information. When the former was done by searching the Expressed Sequence Tag (EST) DB of the sample organism or the NCBI nonredundant (nr) protein DB of green plants using either the MASCOT or SEQUEST software program, it was confirmed that the former is as accurate as the latter. Peptides identified from EST DB were twice as many as those from the nr protein DB, in spite of the fact that the EST DB has less data (26 222 EST) than the NCBI nr protein DB (224 238). This study demonstrates that EST DB with tandem mass spectra can be used reliably for high-throughput proteomics studies in an organism without genome information.  相似文献   

14.
We examine stochastic inequality probabilities of the form P (X > Y) and P (X > max (Y, Z)) where X, Y, and Z are random variables with beta, gamma, or inverse gamma distributions. We discuss the applications of such inequality probabilities to adaptively randomized clinical trials as well as methods for calculating their values.  相似文献   

15.
We describe here a conceptually unique set of individual synthetic peptide combinatorial libraries (SPCLs), termed a positional scanning SPCL (PS-SPCL), that can be used for the rapid (i.e., a single day) identification of peptide sequences that bind with high affinity to antibodies, receptors or other acceptor molecules. The PS-SPCL described here is made up of six individual positional peptide libraries, each one consisting of hexamers with a single position defined and five positions as mixtures. As an example of the utility of such PS-SPCLs, the antigenic determinants recognized by two different monoclonal antibodies were correctly identified upon a single screening.  相似文献   

16.
17.
A novel hybrid methodology for the automated identification of peptides via de novo integer linear optimization, local database search, and tandem mass spectrometry is presented in this article. A modified version of the de novo identification algorithm PILOT, is utilized to construct accurate de novo peptide sequences. A modified version of the local database search tool FASTA is used to query these de novo predictions against the nonredundant protein database to resolve any low-confidence amino acids in the candidate sequences. The computational burden associated with performing several alignments is alleviated with the use of distributive computing. Extensive computational studies are presented for this new hybrid methodology, as well as comparisons with MASCOT for a set of 38 quadrupole time-of-flight (QTOF) and 380 OrbiTrap tandem mass spectra. The results for our proposed hybrid method for the OrbiTrap spectra are also compared with a modified version of PepNovo, which was trained for use on high-precision tandem mass spectra, and the tag-based method InsPecT. The de novo sequences of PILOT and PepNovo are also searched against the nonredundant protein database using CIDentify to compare with the alignments achieved by our modifications of FASTA. The comparative studies demonstrate the excellent peptide identification accuracy gained from combining the strengths of our de novo method, which is based on integer linear optimization, and database driven search methods.  相似文献   

18.
19.
Protein phosphorylation, one of the most important protein post-translational modifications, is involved in various biological processes, and the identification of phosphorylation peptides (phosphopeptides) and their corresponding phosphorylation sites (phosphosites) will facilitate the understanding of the molecular mechanism and function of phosphorylation. Mass spectrometry (MS) provides a high-throughput technology that enables the identification of large numbers of phosphosites. PhoPepMass is designed to assist human phosphopeptide identification from MS data based on a specific database of phophopeptide masses and a multivariate hypergeometric matching algorithm. It contains 244,915 phosphosites from several public sources. Moreover, the accurate masses of peptides and fragments with phosphosites were calculated. It is the first database that provides a systematic resource for the query of phosphosites on peptides and their corresponding masses. This allows researchers to search certain proteins of which phosphosites have been reported, to browse detailed phosphopeptide and fragment information, to match masses from MS analyses with defined threshold to the corresponding phosphopeptide, and to compare proprietary phosphopeptide discovery results with results from previous studies. Additionally, a database search software is created and a “two-stage search strategy” is suggested to identify phosphopeptides from tandem mass spectra of proteomics data. We expect PhoPepMass to be a useful tool and a source of reference for proteomics researchers. PhoPepMass is available at https://www.scbit.org/phopepmass/index.html.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号