首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Tandem mass spectrometry (MS/MS) is frequently used in the identification of peptides and proteins. Typical proteomic experiments rely on algorithms such as SEQUEST and MASCOT to compare thousands of tandem mass spectra against the theoretical fragment ion spectra of peptides in a database. The probabilities that these spectrum-to-sequence assignments are correct can be determined by statistical software such as PeptideProphet or through estimations based on reverse or decoy databases. However, many of the software applications that assign probabilities for MS/MS spectra to sequence matches were developed using training data sets from 3D ion-trap mass spectrometers. Given the variety of types of mass spectrometers that have become commercially available over the last 5 years, we sought to generate a data set of reference data covering multiple instrumentation platforms to facilitate both the refinement of existing computational approaches and the development of novel software tools. We analyzed the proteolytic peptides in a mixture of tryptic digests of 18 proteins, named the "ISB standard protein mix", using 8 different mass spectrometers. These include linear and 3D ion traps, two quadrupole time-of-flight platforms (qq-TOF), and two MALDI-TOF-TOF platforms. The resulting data set, which has been named the Standard Protein Mix Database, consists of over 1.1 million spectra in 150+ replicate runs on the mass spectrometers. The data were inspected for quality of separation and searched using SEQUEST. All data, including the native raw instrument and mzXML formats and the PeptideProphet validated peptide assignments, are available at http://regis-web.systemsbiology.net/PublicDatasets/.  相似文献   

4.
The subject of this tutorial is protein identification and characterisation by database searching of MS/MS Data. Peptide Mass Fingerprinting is excluded because it is covered in a separate tutorial. Practical aspects of database searching are emphasised, such as choice of sequence database, effect of mass tolerance, and how to identify post-translational modifications. The relationship between sensitivity and specificity is discussed, as is the challenge of using peptide match information to infer which proteins were present in the sample. Since these tutorials are introductory in nature, most references are to reviews, rather than primary research papers. Some familiarity with mass spectrometry and protein chemistry is assumed. There is an accompanying slide presentation, including speaker notes, and a collection of web-based, practical exercises, designed to reinforce key points. This Tutorial is part of the International Proteomics Tutorial Programme (IPTP 6).  相似文献   

5.
Synonymous codon replacement can change protein structure and function, indicating that protein structure depends on DNA sequence. During heterologous protein expression, low expression or formation of insoluble aggregates may be attributable to differences in synonymous codon usage between expression and natural hosts. This discordance may be particularly important during translation of the domain boundaries (link/end segments) that separate elements of higher ordered structure. Within such regions, ribosomal progression slows as the ribosome encounters clusters of infrequently used codons that preferentially encode a subset of amino acids. To replicate the modulation of such localized translation rates during heterologous expression, we used known relationships between codon usage frequencies and secondary protein structure to develop an algorithm ("codon harmonization") for identifying regions of slowly translated mRNA that are putatively associated with link/end segments. It then recommends synonymous replacement codons having usage frequencies in the heterologous expression host that are less than or equal to the usage frequencies of native codons in the native expression host. For protein regions other than these putative link/end segments, it recommends synonymous substitutions with codons having usage frequencies matched as nearly as possible to the native expression system. Previous application of this algorithm facilitated E. coli expression, manufacture and testing of two Plasmodium falciparum vaccine candidates. Here we describe the algorithm in detail and apply it to E. coli expression of three additional P. falciparum proteins. Expression of the "recoded" genes exceeded that of the native genes by 4- to 1,000-fold, representing levels suitable for vaccine manufacture. The proteins were soluble and reacted with a variety of functional conformation-specific mAbs suggesting that they were folded properly and had assumed native conformation. Codon harmonization may further provide a general strategy for improving the expression of soluble functional proteins during heterologous expression in hosts other than E. coli.  相似文献   

6.
RankGene: identification of diagnostic genes based on expression data   总被引:9,自引:0,他引:9  
RankGene is a program for analyzing gene expression data and computing diagnostic genes based on their predictive power in distinguishing between different types of samples. The program integrates into one system a variety of popular ranking criteria, ranging from the traditional t-statistic to one-dimensional support vector machines. This flexibility makes RankGene a useful tool in gene expression analysis and feature selection.  相似文献   

7.
8.
Darvish A  Najarian K 《Bio Systems》2006,83(2-3):125-135
We propose a novel technique that constructs gene regulatory networks from DNA microarray data and gene-protein databases and then applies Mason rule to systematically search for the most dominant regulators of the network. The algorithm then recommends the identified dominant regulator genes as the best candidates for future knock-out experiments. Actively choosing the genes for knock-out experiments allows optimal perturbation of the pathway and therefore produces the most informative DNA microarray data for pathway identification purposes. This approach is more practically advantageous in analysis of large pathways where the time and cost of DNA microarray data experiments can be reduced using the proposed optimal experiment design. The proposed method was successfully tested on the galactose regulatory network.  相似文献   

9.

Background  

Over the last decade, kinases have emerged as attractive therapeutic targets for a number of different diseases, and numerous high throughput screening efforts in the pharmaceutical community are directed towards discovery of compounds that regulate kinase function. The emerging utility of systems biology approaches has necessitated the development of multiplex tools suitable for proteomic-scale experiments to replace lower throughput technologies such as mass spectroscopy for the study of protein phosphorylation. Recently, a new approach for identifying substrates of protein kinases has applied the miniaturized format of functional protein arrays to characterize phosphorylation for thousands of candidate protein substrates in a single experiment. This method involves the addition of protein kinases in solution to arrays of immobilized proteins to identify substrates using highly sensitive radioactive detection and hit identification algorithms.  相似文献   

10.
This paper described a simple heuristic method for determining the merit of a set of peptide sequence assignments made using tandem mass spectra. The method involved comparing a prediction based on the known stochastic behavior of a sequence assignment algorithm with the assignments generated from a particular data set. A particular formulation of this comparison was defined through the construction of a plot of the data, the rho-diagram, as well as a parameter derived from this plot, the rho-score. This plot and parameter were shown to be able to readily characterize the relative quality of a set of peptide sequence assignments and to allow the straightforward determination of probability threshold values for the interpretation of proteomics data. This plot is independent of the algorithm or scoring scheme used to estimate the statistical significance of a set of experimental results; rather, it can be used as an objective test of the correctness of those estimates. The rho-score can also be used as a parameter to evaluate the relative merit of protein identifications, such as those made across proteome species taxonomic categories.  相似文献   

11.
Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry analysis following tryptic digestion of polyacrylamide gel pieces is a common technique used to identify proteins. This approach is rapid, sensitive, and user friendly, and is becoming widely available to scientists in a variety of biological fields. Here we introduce a simple and effective strategy called "mass processing" where the list of masses generated from a mass spectrometer undergoes two stages of data reduction before identification. Mass processing improves the ability to identify in-gel tryptic-digested proteins by reducing the number of nonsample masses submitted to protein identification database search engines. Our results demonstrate that mass processing improves the statistical score and rank of putative protein identifications, especially with low-quantity samples, thus increasing the ability to confidently identify proteins with mass spectrometry data.  相似文献   

12.
Commonly accepted intensity-dependent normalization in spotted microarray studies takes account of measurement errors in the differential expression ratio but ignores measurement errors in the total intensity, although the definitions imply the same measurement error components are involved in both statistics. Furthermore, identification of differentially expressed genes is usually considered separately following normalization, which is statistically problematic. By incorporating the measurement errors in both total intensities and differential expression ratios, we propose a measurement-error model for intensity-dependent normalization and identification of differentially expressed genes. This model is also flexible enough to incorporate intra-array and inter-array effects. A Bayesian framework is proposed for the analysis of the proposed measurement-error model to avoid the potential risk of using the common two-step procedure. We also propose a Bayesian identification of differentially expressed genes to control the false discovery rate instead of the ad hoc thresholding of the posterior odds ratio. The simulation study and an application to real microarray data demonstrate promising results.  相似文献   

13.

Background

Effectively predicting protein complexes not only helps to understand the structures and functions of proteins and their complexes, but also is useful for diagnosing disease and developing new drugs. Up to now, many methods have been developed to detect complexes by mining dense subgraphs from static protein-protein interaction (PPI) networks, while ignoring the value of other biological information and the dynamic properties of cellular systems.

Results

In this paper, based on our previous works CPredictor and CPredictor2.0, we present a new method for predicting complexes from PPI networks with both gene expression data and protein functional annotations, which is called CPredictor3.0. This new method follows the viewpoint that proteins in the same complex should roughly have similar functions and are active at the same time and place in cellular systems. We first detect active proteins by using gene express data of different time points and cluster proteins by using gene ontology (GO) functional annotations, respectively. Then, for each time point, we do set intersections with one set corresponding to active proteins generated from expression data and the other set corresponding to a protein cluster generated from functional annotations. Each resulting unique set indicates a cluster of proteins that have similar function(s) and are active at that time point. Following that, we map each cluster of active proteins of similar function onto a static PPI network, and get a series of induced connected subgraphs. We treat these subgraphs as candidate complexes. Finally, by expanding and merging these candidate complexes, the predicted complexes are obtained.We evaluate CPredictor3.0 and compare it with a number of existing methods on several PPI networks and benchmarking complex datasets. The experimental results show that CPredictor3.0 achieves the highest F1-measure, which indicates that CPredictor3.0 outperforms these existing method in overall.

Conclusion

CPredictor3.0 can serve as a promising tool of protein complex prediction.
  相似文献   

14.
The lactating mammary gland utilizes free plasma amino acids as well as those derived by hydrolysis from circulating short-chain peptides for protein synthesis. Apart from the major route of amino acid nitrogen delivery to the gland by the various transporters for free amino acids, it has been suggested that dipeptides may also be taken up in intact form to serve as a source of amino acids. The identification of peptide transporters in the mammary gland may therefore provide new insights into protein metabolism and secretion by the gland. The expression and distribution of the high-affinity type proton-coupled peptide transporter PEPT2 were investigated in rat lactating mammary gland as well as in human epithelial cells derived from breast milk. By use of RT-PCR, PEPT2 mRNA was detected in rat mammary gland extracts and human milk epithelial cells. The expression pattern of PEPT2 mRNA revealed a localization in epithelial cells of ducts and glands by nonisotopic high resolution in situ hybridization. In addition, immunohistochemistry was carried out and showed transporter immunoreactivity in the same epithelial cells of the glands and ducts. In addition, two-electrode voltage clamp recordings using PEPT2-expressing Xenopus laevis oocytes demonstrated positive inward currents induced by selected dipeptides that may play a role in aminonitrogen handling in mammalian mammary gland. Taken together, these data suggest that PEPT2 is expressed in mammary gland epithelia, in which it may contribute to the reuptake of short-chain peptides derived from hydrolysis of milk proteins secreted into the lumen. Whereas PEPT2 also transports a variety of drugs, such as selected beta-lactams, angiotensin-converting enzyme inhibitors, and antiviral and anticancer metabolites, their efficient reabsorption via PEPT2 may reduce the burden of xenobiotics in milk.  相似文献   

15.
16.
概述了多肽和蛋白质药物的肺吸收机制和用于吸入给药的研究进展,并简要讨论了多肽和蛋白质药物在用于吸入给药时存在的问题及今后的发展方向,为多肽和蛋白质药物的吸入给药研究提供一定的参考。  相似文献   

17.
Protein identification using mass spectrometry is an indispensable computational tool in the life sciences. A dramatic increase in the use of proteomic strategies to understand the biology of living systems generates an ongoing need for more effective, efficient, and accurate computational methods for protein identification. A wide range of computational methods, each with various implementations, are available to complement different proteomic approaches. A solid knowledge of the range of algorithms available and, more critically, the accuracy and effectiveness of these techniques is essential to ensure as many of the proteins as possible, within any particular experiment, are correctly identified. Here, we undertake a systematic review of the currently available methods and algorithms for interpreting, managing, and analyzing biological data associated with protein identification. We summarize the advances in computational solutions as they have responded to corresponding advances in mass spectrometry hardware. The evolution of scoring algorithms and metrics for automated protein identification are also discussed with a focus on the relative performance of different techniques. We also consider the relative advantages and limitations of different techniques in particular biological contexts. Finally, we present our perspective on future developments in the area of computational protein identification by considering the most recent literature on new and promising approaches to the problem as well as identifying areas yet to be explored and the potential application of methods from other areas of computational biology.  相似文献   

18.

Background  

Techniques for measuring protein abundance are rapidly advancing and we are now in a situation where we anticipate many protein abundance data sets will be available in the near future. Since proteins are translated from mRNAs, their expression is expected to be related to their abundance, to some degree.  相似文献   

19.
Here we perform a systematic exploration of the use of distance constraints derived from small angle X-ray scattering (SAXS) measurements to filter candidate protein structures for the purpose of protein structure prediction. This is an intrinsically more complex task than that of applying distance constraints derived from NMR data where the identity of the pair of amino acid residues subject to a given distance constraint is known. SAXS, on the other hand, yields a histogram of pair distances (pair distribution function), but the identities of the pairs contributing to a given bin of the histogram are not known. Our study is based on an extension of the Levitt-Hinds coarse grained approach to ab initio protein structure prediction to generate a candidate set of C(alpha) backbones. In spite of the lack of specific residue information inherent in the SAXS data, our study shows that the implementation of a SAXS filter is capable of effectively purifying the set of native structure candidates and thus provides a substantial improvement in the reliability of protein structure prediction. We test the quality of our predicted C(alpha) backbones by doing structural homology searches against the Dali domain library, and find that the results are very encouraging. In spite of the lack of local structural details and limited modeling accuracy at the C(alpha) backbone level, we find that useful information about fold classification can be extracted from this procedure. This approach thus provides a way to use a SAXS data based structure prediction algorithm to generate potential structural homologies in cases where lack of sequence homology prevents identification of candidate folds for a given protein. Thus our approach has the potential to help in determination of the biological function of a protein based on structural homology instead of sequence homology.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号