首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 881 毫秒
1.
2.
In shotgun proteomics, high-throughput mass spectrometry experiments and the subsequent data analysis produce thousands to millions of hypothetical peptide identifications. The common way to estimate the false discovery rate (FDR) of peptide identifications is the target-decoy database search strategy, which is efficient and accurate for large datasets. However, the legitimacy of the target-decoy strategy for protein-modification-centric studies has rarely been rigorously validated. It is often the case that a global FDR is estimated for all peptide identifications including both modified and unmodified peptides, but that only a subgroup of identifications with a certain type of modification is focused on. As revealed recently, the subgroup FDR of modified peptide identifications can differ dramatically from the global FDR at the same score threshold, and thus the former, when it is of interest, should be separately estimated. However, rare modifications often result in a very small number of modified peptide identifications, which makes the direct separate FDR estimation inaccurate because of the inadequate sample size. This paper presents a method called the transferred FDR for accurately estimating the FDR of an arbitrary number of modified peptide identifications. Through flexible use of the empirical data from a target-decoy database search, a theoretical relationship between the subgroup FDR and the global FDR is made computable. Through this relationship, the subgroup FDR can be predicted from the global FDR, allowing one to avoid an inaccurate direct estimation from a limited amount of data. The effectiveness of the method is demonstrated with both simulated and real mass spectra.Post-translational modifications of proteins often play an essential role in the functions of proteins in cells (1). Abnormal modifications can change the properties of proteins, causing serious diseases (2). Because protein modifications are not directly encoded in the nucleotide sequences of organisms, they must be investigated at the protein level. In recent years, mass spectrometry technology has developed rapidly and has become the standard method for identifying proteins and their modifications in biological and clinical samples (35).In shotgun proteomics experiments, proteins are digested into peptide mixtures that are then analyzed via high-throughput liquid chromatography–tandem mass spectrometry, resulting in thousands to millions of tandem mass spectra. To identify the peptide sequences and the modifications on them, the spectra are commonly searched against a protein sequence database (68). During the database search, according to the variable modification types specified by the user, all forms of modified candidate peptides are enumerated. For each spectrum, candidate peptides (with possible modifications) from the database are scored according to the quality of their match to the input spectrum. However, for many reasons, the top-scored matches are not always correct peptide identifications, and therefore they must be filtered according to their identification scores (9). Finding an appropriate score threshold that gives the desired false discovery rate (FDR)1 is a multiple hypothesis testing problem (1012).At present, the common way to control the FDR of peptide identifications is an empirical approach called the target-decoy search strategy (13). In this strategy, in addition to the target protein sequences, the mass spectra are also searched against the same number of decoy protein sequences (e.g. reverse sequences of the target proteins). Because an incorrect identification has an equal chance of being a match to the target sequences or to the decoy sequences, the number of decoy matches above a score threshold can be used as an estimate of the number of random target matches, and the FDR (of the target matches) can be simply estimated as the number of decoy matches divided by the number of target matches. The target-decoy method, although simple and effective, is applicable to large datasets only. When the number of matches being evaluated is very small, this method becomes inaccurate because of the inadequate sample size (13, 14). Fortunately, for high-throughput proteomic mass spectrometry experiments, the number of mass spectra is always sufficiently large. Current efforts are mostly devoted to increasing the sensitivity of peptide identification at a given FDR by using various techniques such as machine learning (15).When the purpose of an experiment is to search for protein modifications, the problem of FDR estimation becomes somewhat complex. In fact, the legality of the target-decoy method for modification-centric studies was not rigorously discussed until very recently (16). At present, for multiple reasons, the identifications of modified and unmodified peptides are usually combined in the search result, and a global FDR is estimated for them in combination, with only a subgroup of identifications with specific modifications being focused on. However, the FDR of modified peptides can be significantly or even extremely different from that of unmodified peptides at the same score threshold. There are three reasons for this fact. First, because the spectra of modified peptides can have their own features (e.g. insufficient fragmentation or neutral losses), they can have different score distributions from those of unmodified peptides. Second, because the proportions of modified and unmodified peptides in the protein sample are different, the prior probabilities of obtaining a correct identification are different for modified and unmodified peptides. Third, because the proportions of modified and unmodified candidate peptides in the search space are different, the prior probabilities of obtaining an incorrect identification are also different for modified and unmodified peptides. Therefore, the modified peptide identifications of interest should be extracted from the identification result and subjected to a separate FDR estimation, as pointed out recently (1618).The difficulty of separate FDR estimations is highlighted when there are too few modified peptide identifications to allow an accurate estimation. Many protein modifications are present in low abundance in cells but play important biological functions. These rare modifications have very low chances of being detected by mass spectrometry. A crucial question is, if very few modifications are identified from a very large dataset of mass spectra, can they be regarded as correct identifications? There was no answer to this question in the past in terms of FDR control. The target-decoy strategy loses its efficacy in such cases. For example, imagine that we have 10 modified peptide identifications above a score threshold after a search and that all of them are matches to target protein sequences. Can we say that the FDR of these identifications is zero (0/10)? If we decrease the score threshold slightly in such a way that one more modified peptide identification is included but find that that peptide is unfortunately a match to the decoy sequence, then can we say that the FDR of the top 10 target identifications is 10% (1/10)? It is clear here that the inclusion or exclusion of the 11th decoy identification has a great influence on the FDR estimated via the common target-decoy strategy. In fact, according to a binomial model (14), the probability that there are one or more false identifications among the top 10 target matches is as high as 0.5, which means that the real proportion of false discoveries has a half-chance of being no less than 10% (1/10). The appropriate way to estimate the FDR of the 10 target identifications is to give an appropriate estimate of the expected number of false identifications among them, and, most important, this estimate must not be an integer (e.g. 0 or 1) but can be a real number between 0 and 1. Note that single-spectrum significance measures (e.g. p values) are not appropriate for multiple hypothesis testing, not to mention that they can hardly be accurately computed in mass spectrometry.Separate FDR estimation for grouped multiple hypothesis testing is not new in statistics and bioinformatics. A typical example is the microarray data of mRNAs from different locations in an organism or from genes that are involved in different biological processes (19, 20). Efron (21) recently proposed a method for robust separate FDR estimation for small subgroups in the empirical Bayes framework. The underlying principle of this method is that if we can find the quantitative relationship between the subgroup FDR and the global FDR, the former can be indirectly inferred from the latter instead of being estimated from a limited amount of data. The relationship given by Efron is quite general and makes no use of domain-specific information. Furthermore, it requires known conditional probabilities of null and non-null cases given the score threshold. These probabilities are, however, unavailable in the modified peptide identification problem.This paper presents a dedicated method for accurate FDR estimation for rare protein modifications detected from large-scale mass spectral data. This method is based on a theoretical relationship between the subgroup FDR of modified peptide identifications and the global FDR of all peptide identifications. To make the relationship computable, the component factors in it are replaced by or fitted from the empirical data of the target-decoy database search results. Most important, the probability that an incorrect identification is an assignment of a modified peptide is approximated by a linear function of the score threshold. By extrapolation, this probability can be reliably obtained for high-tail scores that are suitable as thresholds. The proposed method was validated on both simulated and real mass spectra. To the best of our knowledge, this study is the first effort toward reliable FDR control of rare protein modifications identified from mass spectra. (Note that the error rate control for modification site location is another complex problem (22, 23) and is not the aim of this paper.)  相似文献   

3.
4.
5.
6.
7.
8.
Protein–protein interactions (PPIs) are fundamental to the structure and function of protein complexes. Resolving the physical contacts between proteins as they occur in cells is critical to uncovering the molecular details underlying various cellular activities. To advance the study of PPIs in living cells, we have developed a new in vivo cross-linking mass spectrometry platform that couples a novel membrane-permeable, enrichable, and MS-cleavable cross-linker with multistage tandem mass spectrometry. This strategy permits the effective capture, enrichment, and identification of in vivo cross-linked products from mammalian cells and thus enables the determination of protein interaction interfaces. The utility of the developed method has been demonstrated by profiling PPIs in mammalian cells at the proteome scale and the targeted protein complex level. Our work represents a general approach for studying in vivo PPIs and provides a solid foundation for future studies toward the complete mapping of PPI networks in living systems.Protein–protein interactions (PPIs)1 play a key role in defining protein functions in biological systems. Aberrant PPIs can have drastic effects on biochemical activities essential to cell homeostasis, growth, and proliferation, and thereby lead to various human diseases (1). Consequently, PPI interfaces have been recognized as a new paradigm for drug development. Therefore, mapping PPIs and their interaction interfaces in living cells is critical not only for a comprehensive understanding of protein function and regulation, but also for describing the molecular mechanisms underlying human pathologies and identifying potential targets for better therapeutics.Several strategies exist for identifying and mapping PPIs, including yeast two-hybrid, protein microarray, and affinity purification mass spectrometry (AP-MS) (25). Thanks to new developments in sample preparation strategies, mass spectrometry technologies, and bioinformatics tools, AP-MS has become a powerful and preferred method for studying PPIs at the systems level (69). Unlike other approaches, AP-MS experiments allow the capture of protein interactions directly from their natural cellular environment, thus better retaining native protein structures and biologically relevant interactions. In addition, a broader scope of PPI networks can be obtained with greater sensitivity, accuracy, versatility, and speed. Despite the success of this very promising technique, AP-MS experiments can lead to the loss of weak/transient interactions and/or the reorganization of protein interactions during biochemical manipulation under native purification conditions. To circumvent these problems, in vivo chemical cross-linking has been successfully employed to stabilize protein interactions in native cells or tissues prior to cell lysis (1016). The resulting covalent bonds formed between interacting partners allow affinity purification under stringent and fully denaturing conditions, consequently reducing nonspecific background while preserving stable and weak/transient interactions (1216). Subsequent mass spectrometric analysis can reveal not only the identities of interacting proteins, but also cross-linked amino acid residues. The latter provides direct molecular evidence describing the physical contacts between and within proteins (17). This information can be used for computational modeling to establish structural topologies of proteins and protein complexes (1722), as well as for generating experimentally derived protein interaction network topology maps (23, 24). Thus, cross-linking mass spectrometry (XL-MS) strategies represent a powerful and emergent technology that possesses unparalleled capabilities for studying PPIs.Despite their great potential, current XL-MS studies that have aimed to identify cross-linked peptides have been mostly limited to in vitro cross-linking experiments, with few successfully identifying protein interaction interfaces in living cells (24, 25). This is largely because XL-MS studies remain challenging due to the inherent difficulty in the effective MS detection and accurate identification of cross-linked peptides, as well as in unambiguous assignment of cross-linked residues. In general, cross-linked products are heterogeneous and low in abundance relative to non-cross-linked products. In addition, their MS fragmentation is too complex to be interpreted using conventional database searching tools (17, 26). It is noted that almost all of the current in vivo PPI studies utilize formaldehyde cross-linking because of its membrane permeability and fast kinetics (1016). However, in comparison to the most commonly used amine reactive NHS ester cross-linkers, identification of formaldehyde cross-linked peptides is even more challenging because of its promiscuous nonspecific reactivity and extremely short spacer length (27). Therefore, further developments in reagents and methods are urgently needed to enable simple MS detection and effective identification of in vivo cross-linked products, and thus allow the mapping of authentic protein contact sites as established in cells, especially for protein complexes.Various efforts have been made to address the limitations of XL-MS studies, resulting in new developments in bioinformatics tools for improved data interpretation (2832) and new designs of cross-linking reagents for enhanced MS analysis of cross-linked peptides (24, 3339). Among these approaches, the development of new cross-linking reagents holds great promise for mapping PPIs on the systems level. One class of cross-linking reagents containing an enrichment handle have been shown to allow selective isolation of cross-linked products from complex mixtures, boosting their detectability by MS (3335, 4042). A second class of cross-linkers containing MS-cleavable bonds have proven to be effective in facilitating the unambiguous identification of cross-linked peptides (3639, 43, 44), as the resulting cross-linked products can be identified based on their characteristic and simplified fragmentation behavior during MS analysis. Therefore, an ideal cross-linking reagent would possess the combined features of both classes of cross-linkers. To advance the study of in vivo PPIs, we have developed a new XL-MS platform based on a novel membrane-permeable, enrichable, and MS-cleavable cross-linker, Azide-A-DSBSO (azide-tagged, acid-cleavable disuccinimidyl bis-sulfoxide), and multistage tandem mass spectrometry (MSn). This new XL-MS strategy has been successfully employed to map in vivo PPIs from mammalian cells at both the proteome scale and the targeted protein complex level.  相似文献   

9.
Previous studies have shown that protein-protein interactions among splicing factors may play an important role in pre-mRNA splicing. We report here identification and functional characterization of a new splicing factor, Sip1 (SC35-interacting protein 1). Sip1 was initially identified by virtue of its interaction with SC35, a splicing factor of the SR family. Sip1 interacts with not only several SR proteins but also with U1-70K and U2AF65, proteins associated with 5′ and 3′ splice sites, respectively. The predicted Sip1 sequence contains an arginine-serine-rich (RS) domain but does not have any known RNA-binding motifs, indicating that it is not a member of the SR family. Sip1 also contains a region with weak sequence similarity to the Drosophila splicing regulator suppressor of white apricot (SWAP). An essential role for Sip1 in pre-mRNA splicing was suggested by the observation that anti-Sip1 antibodies depleted splicing activity from HeLa nuclear extract. Purified recombinant Sip1 protein, but not other RS domain-containing proteins such as SC35, ASF/SF2, and U2AF65, restored the splicing activity of the Sip1-immunodepleted extract. Addition of U2AF65 protein further enhanced the splicing reconstitution by the Sip1 protein. Deficiency in the formation of both A and B splicing complexes in the Sip1-depleted nuclear extract indicates an important role of Sip1 in spliceosome assembly. Together, these results demonstrate that Sip1 is a novel RS domain-containing protein required for pre-mRNA splicing and that the functional role of Sip1 in splicing is distinct from those of known RS domain-containing splicing factors.Pre-mRNA splicing takes place in spliceosomes, the large RNA-protein complexes containing pre-mRNA, U1, U2, U4/6, and U5 small nuclear ribonucleoprotein particles (snRNPs), and a large number of accessory protein factors (for reviews, see references 21, 22, 37, 44, and 48). It is increasingly clear that the protein factors are important for pre-mRNA splicing and that studies of these factors are essential for further understanding of molecular mechanisms of pre-mRNA splicing.Most mammalian splicing factors have been identified by biochemical fractionation and purification (3, 15, 19, 3136, 45, 6971, 73), by using antibodies recognizing splicing factors (8, 9, 16, 17, 61, 66, 67, 74), and by sequence homology (25, 52, 74).Splicing factors containing arginine-serine-rich (RS) domains have emerged as important players in pre-mRNA splicing. These include members of the SR family, both subunits of U2 auxiliary factor (U2AF), and the U1 snRNP protein U1-70K (for reviews, see references 18, 41, and 59). Drosophila alternative splicing regulators transformer (Tra), transformer 2 (Tra2), and suppressor of white apricot (SWAP) also contain RS domains (20, 40, 42). RS domains in these proteins play important roles in pre-mRNA splicing (7, 71, 75), in nuclear localization of these splicing proteins (23, 40), and in protein-RNA interactions (56, 60, 64). Previous studies by us and others have demonstrated that one mechanism whereby SR proteins function in splicing is to mediate specific protein-protein interactions among spliceosomal components and between general splicing factors and alternative splicing regulators (1, 1a, 6, 10, 27, 63, 74, 77). Such protein-protein interactions may play critical roles in splice site recognition and association (for reviews, see references 4, 18, 37, 41, 47 and 59). Specific interactions among the splicing factors also suggest that it is possible to identify new splicing factors by their interactions with known splicing factors.Here we report identification of a new splicing factor, Sip1, by its interaction with the essential splicing factor SC35. The predicted Sip1 protein sequence contains an RS domain and a region with sequence similarity to the Drosophila splicing regulator, SWAP. We have expressed and purified recombinant Sip1 protein and raised polyclonal antibodies against the recombinant Sip1 protein. The anti-Sip1 antibodies specifically recognize a protein migrating at a molecular mass of approximately 210 kDa in HeLa nuclear extract. The anti-Sip1 antibodies sufficiently deplete Sip1 protein from the nuclear extract, and the Sip1-depleted extract is inactive in pre-mRNA splicing. Addition of recombinant Sip1 protein can partially restore splicing activity to the Sip1-depleted nuclear extract, indicating an essential role of Sip1 in pre-mRNA splicing. Other RS domain-containing proteins, including SC35, ASF/SF2, and U2AF65, cannot substitute for Sip1 in reconstituting splicing activity of the Sip1-depleted nuclear extract. However, addition of U2AF65 further increases splicing activity of Sip1-reconstituted nuclear extract, suggesting that there may be a functional interaction between Sip1 and U2AF65 in nuclear extract.  相似文献   

10.
11.
Bone samples from several vertebrates were collected from the Ziegler Reservoir fossil site, in Snowmass Village, Colorado, and processed for proteomics analysis. The specimens come from Pleistocene megafauna Bison latifrons, dating back ∼120,000 years. Proteomics analysis using a simplified sample preparation procedure and tandem mass spectrometry (MS/MS) was applied to obtain protein identifications. Several bioinformatics resources were used to obtain peptide identifications based on sequence homology to extant species with annotated genomes. With the exception of soil sample controls, all samples resulted in confident peptide identifications that mapped to type I collagen. In addition, we analyzed a specimen from the extinct B. latifrons that yielded peptide identifications mapping to over 33 bovine proteins. Our analysis resulted in extensive fibrillar collagen sequence coverage, including the identification of posttranslational modifications. Hydroxylysine glucosylgalactosylation, a modification thought to be involved in collagen fiber formation and bone mineralization, was identified for the first time in an ancient protein dataset. Meta-analysis of data from other studies indicates that this modification may be common in well-preserved prehistoric samples. Additional peptide sequences from extracellular matrix (ECM) and non-ECM proteins have also been identified for the first time in ancient tissue samples. These data provide a framework for analyzing ancient protein signatures in well-preserved fossil specimens, while also contributing novel insights into the molecular basis of organic matter preservation. As such, this analysis has unearthed common posttranslational modifications of collagen that may assist in its preservation over time. The data are available via ProteomeXchange with identifier PXD001827.During the last decade, paleontology and taphonomy (the study of decaying organisms over time and the fossilization processes) have begun to overlap with the field of proteomics to shed new light on preserved organic matter in fossilized bones (14). These bones represent a time capsule of ancient biomolecules, owing to their natural resistance to post mortem decay arising from a unique combination of mechanical, structural, and chemical properties (47).Although bones can be cursorily described as a composite of collagen (protein) and hydroxyapatite (mineral), fossilized bones undergo three distinct diagenesis pathways: (i) chemical deterioration of the organic phase; (ii) chemical deterioration of the mineral phase; and (iii) (micro)biological attack of the composite (6). In addition, the rate of these degradation pathways are affected by temperature, as higher burial temperatures have been shown to accelerate these processes (6, 8). Though relatively unusual, the first of these three pathways results in a slower deterioration process, which is more generally mitigated under (6) specific environmental constraints, such as geochemical stability (stable temperature and acidity) that promote bone mineral preservation. Importantly, slower deterioration results in more preserved biological materials that are more amenable to downstream analytical assays. One example of this is the controversial case of bone and soft-tissue preservation from the Cretaceous/Tertiary boundary (922). In light of these and other studies of ancient biomolecules, paleontological models have proposed that organic biomolecules in ancient samples, such as collagen sequences from the 80 million-year-(my)-old Campanian hadrosaur, Brachylophosaurus canadensis (16) or 68-my-old Tyrannosaurus rex, might be protected by the microenvironment within bones. Such spaces are believed to form a protective shelter that is able to reduce the effects of diagenetic events. In addition to collagen, preserved biomolecules include blood proteins, cellular lipids, and DNA (4, 5). While the maximum estimated lifespan of DNA in bones is ∼20,000 years (ky) at 10 °C, bone proteins have an even longer lifespan, making them an exceptional target for analysis to gain relevant insights into fossilized samples (6). Indeed, the survival of collagen, which is considered to be the most abundant bone protein, is estimated to be in the range 340 ky at 20 °C. Similarly, osteocalcin, the second-most abundant bone protein, can persist for ≈45 ky at 20 °C, thus opening an unprecedented analytical window to study extremely old samples (2, 4, 23).Although ancient DNA amplification and sequencing can yield interesting clues and potential artifacts from contaminating agents (7, 2428), the improved preservation of ancient proteins provides access to a reservoir of otherwise unavailable genetic information for phylogenetic inference (25, 29, 30). In particular, mass spectrometry (MS)-based screening of species-specific collagen peptides has recently been used as a low-cost, rapid alternative to DNA sequencing for taxonomic attribution of morphologically unidentifiable small bone fragments and teeth stemming from diverse archeological contexts (25, 3133).For over five decades, researchers have presented biochemical evidence for the existence of preserved protein material from ancient bone samples (3436). One of the first direct measurements was by amino acid analysis, which showed that the compositional profile of ancient samples was consistent with collagens in modern bone samples (3739). Preservation of organic biomolecules, either from bone, dentin, antlers, or ivory, has been investigated by radiolabeled 14C fossil dating (40) to provide an avenue of delineating evolutionary divergence from extant species (3, 41, 42). It is also important to note that these parameters primarily depend on ancient bone collagen as the levels remain largely unchanged (a high percentage of collagen is retained, as gleaned by laboratory experiments on bone taphonomy (6)). Additionally, antibody-based immunostaining methods have given indirect evidence of intact peptide amide bonds (4345) to aid some of the first evidence of protein other than collagen and osteocalcin in ancient mammoth (43) and human specimens (46).In the past, mass spectrometry has been used to obtain MS signals consistent with modern osteocalcin samples (2, 47), and eventually postsource decay peptide fragmentation was used to confirm the identification of osteocalcin in fossil hominids dating back ∼75 ky (48). More recently, modern “bottom-up” proteomic methods were applied to mastodon and T. rex samples (10), complementing immunohistochemistry evidence (13, 17). The results hinted at the potential of identifying peptides from proteolytic digest of well-preserved bone samples. This work also highlighted the importance of minimizing sources of protein contamination and adhering to data publication guidelines (20, 21). In the past few years, a very well-preserved juvenile mammoth referred to as Lyuba was discovered in the Siberian permafrost and analyzed using high-resolution tandem mass spectrometry (29). This study was followed with a report by Wadsworth and Buckley (30) describing the analysis of proteins from 19 bovine bone samples spanning 4 ky to 1.5 my. Both of these groups reported the identification of additional collagen and noncollagen proteins.Recently, a series of large extinct mammal bones were unearthed at a reservoir near Snowmass Village, Colorado, USA (49, 50). The finding was made during a construction project at the Ziegler Reservoir, a fossil site that was originally a lake formed at an elevation of ∼2,705 m during the Bull Lake glaciations ∼140 ky ago (49, 51). The original lake area was ∼5 hectares in size with a total catchment of ∼14 hectares and lacked a direct water flow inlet or outlet. This closed drainage basin established a relatively unique environment that resulted in the exceptional preservation of plant material, insects (52), and vertebrate bones (49). In particular, a cranial specimen from extinct Bison latifrons was unearthed from the Biostratigraphic Zone/Marine Oxygen Isotope Stage (MIS) 5d, which dates back to ∼120 ky (53, 54).Here, we describe the use of paleoproteomics, for the identification of protein remnants with a focus on a particularly unique B. latifrons cranial specimen found at the Ziegler site. We developed a simplified sample processing approach that allows for analysis of low milligram quantities of ancient samples for peptide identification. Our method avoids the extensive demineralization steps of traditional protocols and utilizes an acid labile detergent to allow for efficient extraction and digestion without the need for additional sample cleanup steps. This approach was applied to a specimen from B. latifrons that displayed visual and mechanical properties consistent with the meninges, a fibrous tissue that lines the cranial cavity. Bioinformatics analysis revealed the presence of a recurring glycosylation signature in well-preserved collagens. In particular, the presence of glycosylated hydroxylysine residues was identified as a unique feature of bone fossil collagen, as gleaned through meta-analyses of raw data from previous reports on woolly mammoth (Mammuthus primigenius) and bovine samples (29, 30). The results from these meta-analyses indicate a common, unique feature of collagen that coincides with, and possibly contributes to its preservation.  相似文献   

12.
It remains extraordinarily challenging to elucidate endogenous protein-protein interactions and proximities within the cellular milieu. The dynamic nature and the large range of affinities of these interactions augment the difficulty of this undertaking. Among the most useful tools for extracting such information are those based on affinity capture of target bait proteins in combination with mass spectrometric readout of the co-isolated species. Although highly enabling, the utility of affinity-based methods is generally limited by difficulties in distinguishing specific from nonspecific interactors, preserving and isolating all unique interactions including those that are weak, transient, or rapidly exchanging, and differentiating proximal interactions from those that are more distal. Here, we have devised and optimized a set of methods to address these challenges. The resulting pipeline involves flash-freezing cells in liquid nitrogen to preserve the cellular environment at the moment of freezing; cryomilling to fracture the frozen cells into intact micron chunks to allow for rapid access of a chemical reagent and to stabilize the intact endogenous subcellular assemblies and interactors upon thawing; and utilizing the high reactivity of glutaraldehyde to achieve sufficiently rapid stabilization at low temperatures to preserve native cellular interactions. In the course of this work, we determined that relatively low molar ratios of glutaraldehyde to reactive amines within the cellular milieu were sufficient to preserve even labile and transient interactions. This mild treatment enables efficient and rapid affinity capture of the protein assemblies of interest under nondenaturing conditions, followed by bottom-up MS to identify and quantify the protein constituents. For convenience, we have termed this approach Stabilized Affinity Capture Mass Spectrometry. Here, we demonstrate that Stabilized Affinity Capture Mass Spectrometry allows us to stabilize and elucidate local, distant, and transient protein interactions within complex cellular milieux, many of which are not observed in the absence of chemical stabilization.Insights into many cellular processes require detailed information about interactions between the participating proteins. However, the analysis of such interactions can be challenging because of the often-diverse physicochemical properties and the abundances of the constituent proteins, as well as the sometimes wide range of affinities and complex dynamics of the interactions. One of the key challenges has been acquiring information concerning transient, low affinity interactions in highly complex cellular milieux (3, 4).Methods that allow elucidation of such information include co-localization microscopy (5), fluorescence protein Förster resonance energy transfer (4), immunoelectron microscopy (5), yeast two-hybrid (6), and affinity capture (7, 8). Among these, affinity capture (AC)1 has the unique potential to detect all specific in vivo interactions simultaneously, including those that interact both directly and indirectly. In recent times, the efficacy of such affinity isolation experiments has been greatly enhanced through the use of sensitive modern mass spectrometric protein identification techniques (9). Nevertheless, AC suffers from several shortcomings. These include the problem of 1) distinguishing specific from nonspecific interactors (10, 11); 2) preserving and isolating all unique interactions including those that are weak and/or transient, as well as those that exchange rapidly (10, 12, 13); and 3) differentiating proximal from more distant interactions (14).We describe here an approach to address these issues, which makes use of chemical stabilization of protein assemblies in the complex cellular milieu prior to AC. Chemical stabilization is an emerging technique for stabilizing and elucidating protein associations both in vitro (1520) and in vivo (3, 12, 14, 2129), with mass spectrometric (MS) readout of the AC proteins and their connectivities. Such chemical stabilization methods are indeed well-established and are often used in electron microscopy for preserving complexes and subcellular structures both in the cellular milieu (3) and in purified complexes (30, 31), wherein the most reliable, stable, and established stabilization reagents is glutaraldehyde. Recently, glutaraldehyde has been applied in the “GraFix” protocol in which purified protein complexes are subjected to centrifugation through a density gradient that also contains a gradient of glutaraldehyde (30, 31), allowing for optimal stabilization of authentic complexes and minimization of nonspecific associations and aggregation. GraFix has also been combined with mass spectrometry on purified complexes bound to EM grids to obtain a compositional analysis of the complexes (32), thereby raising the possibility that glutaraldehyde can be successfully utilized in conjunction with AC in complex cellular milieux directly.In this work, we present a robust pipeline for determining specific protein-protein interactions and proximities from cellular milieux. The first steps of the pipeline involve the well-established techniques of flash freezing the cells of interest in liquid nitrogen and cryomilling, which have been known for over a decade (33, 34) to preserve the cellular environment, as well as having shown outstanding performance when used in analysis of macromolecular interactions in yeast (3539), bacterial (40, 41), trypanosome (42), mouse (43), and human (4447) systems. The resulting frozen powder, composed of intact micron chunks of cells that have great surface area and outstanding solvent accessibility, is well suited for rapid low temperature chemical stabilization using glutaraldehyde. We selected glutaraldehyde for our procedure based on the fact that it is a very reactive stabilizing reagent, even at lower temperatures, and because it has already been shown to stabilize enzymes in their functional state (4850). We employed highly efficient, rapid, single stage affinity capture (36, 51) for isolation and bottom-up MS for analysis of the macromolecular assemblies of interest (5254). For convenience, we have termed this approach Stabilized Affinity-Capture Mass Spectrometry (SAC-MS).  相似文献   

13.
14.
Quantitative proteome analyses suggest that the well-established stain colloidal Coomassie Blue, when used as an infrared dye, may provide sensitive, post-electrophoretic in-gel protein detection that can rival even Sypro Ruby. Considering the central role of two-dimensional gel electrophoresis in top-down proteomic analyses, a more cost effective alternative such as Coomassie Blue could prove an important tool in ongoing refinements of this important analytical technique. To date, no systematic characterization of Coomassie Blue infrared fluorescence detection relative to detection with SR has been reported. Here, seven commercial Coomassie stain reagents and seven stain formulations described in the literature were systematically compared. The selectivity, threshold sensitivity, inter-protein variability, and linear-dynamic range of Coomassie Blue infrared fluorescence detection were assessed in parallel with Sypro Ruby. Notably, several of the Coomassie stain formulations provided infrared fluorescence detection sensitivity to <1 ng of protein in-gel, slightly exceeding the performance of Sypro Ruby. The linear dynamic range of Coomassie Blue infrared fluorescence detection was found to significantly exceed that of Sypro Ruby. However, in two-dimensional gel analyses, because of a blunted fluorescence response, Sypro Ruby was able to detect a few additional protein spots, amounting to 0.6% of the detected proteome. Thus, although both detection methods have their advantages and disadvantages, differences between the two appear to be small. Coomassie Blue infrared fluorescence detection is thus a viable alternative for gel-based proteomics, offering detection comparable to Sypro Ruby, and more reliable quantitative assessments, but at a fraction of the cost.Gel electrophoresis is an accessible, widely applicable and mature protein resolving technology. As the original top-down approach to proteomic analyses, among its many attributes the high resolution achievable by two dimensional gel-electrophoresis (2DE)1 ensures that it remains an effective analytical technology despite the appearance of alternatives. However, in-gel detection remains a limiting factor for gel-based analyses; available technology generally permits the detection and quantification of only relatively abundant proteins (35). Many critical components in normal physiology and also disease may be several orders of magnitude less abundant and thus below the detection threshold of in-gel stains, or indeed most techniques. Pre- and post-fractionation technologies have been developed to address this central issue in proteomics but these are not without limitations (15). Thus improved detection methods for gel-based proteomics continue to be a high priority, and the literature is rich with different in-gel detection methods and innovative improvements (634). This history of iterative refinement presents a wealth of choices when selecting a detection strategy for a gel-based proteomic analysis (35).Perhaps the best known in-gel detection method is the ubiquitous Coomassie Blue (CB) stain; CB has served as a gel stain and protein quantification reagent for over 40 years. Though affordable, robust, easy to use, and compatible with mass spectrometry (MS), CB staining is relatively insensitive. In traditional organic solvent formulations, CB detects ∼ 10 ng of protein in-gel, and some reports suggest poorer sensitivity (27, 29, 36, 37). Sensitivity is hampered by relatively high background staining because of nonspecific retention of dye within the gel matrix (32, 36, 38, 39). The development of colloidal CB (CCB) formulations largely addressed these limitations (12); the concentration of soluble CB was carefully controlled by sequestering the majority of the dye into colloidal particles, mediated by pH, solvent, and the ionic strength of the solution. Minimizing soluble dye concentration and penetration of the gel matrix mitigated background staining, and the introduction of phosphoric acid into the staining reagent enhanced dye-protein interactions (8, 12, 40), contributing to an in-gel staining sensitivity of 5–10 ng protein, with some formulations reportedly yielding sensitivities of 0.1–1 ng (8, 12, 22, 39, 41, 42). Thus CCB achieved higher sensitivity than traditional CB staining, yet maintained all the advantages of the latter, including low cost and compatibility with existing densitometric detection instruments and MS. Although surpassed by newer methods, the practical advantages of CCB ensure that it remains one of the most common gel stains in use.Fluorescent stains have become the routine and sensitive alternative to visible dyes. Among these, the ruthenium-organometallic family of dyes have been widely applied and the most commercially well-known is Sypro Ruby (SR), which is purported to interact noncovalently with primary amines in proteins (15, 18, 19, 43). Chief among the attributes of these dyes is their high sensitivity. In-gel detection limits of < 1 ng for some proteins have been reported for SR (6, 9, 14, 44, 45). Moreover, SR staining has been reported to yield a greater linear dynamic range (LDR), and reduced interprotein variability (IPV) compared with CCB and silver stains (15, 19, 4649). SR is easy to use, fully MS compatible, and relatively forgiving of variations in initial conditions (6, 15). The chief consequence of these advances remains high cost; SR and related stains are notoriously expensive, and beyond the budget of many laboratories. Furthermore, despite some small cost advantage relative to SR, none of the available alternatives has been consistently and quantitatively demonstrated to substantially improve on the performance of SR under practical conditions (9, 50).Notably, there is evidence to suggest that CCB staining is not fundamentally insensitive, but rather that its sensitivity has been limited by traditional densitometric detection (50, 51). When excited in the near IR at ∼650 nm, protein-bound CB in-gel emits light in the range of 700–800 nm. Until recently, the lack of low-cost, widely available and sufficiently sensitive infrared (IR)-capable imaging instruments prevented mainstream adoption of in-gel CB infrared fluorescence detection (IRFD); advances in imaging technology are now making such instruments far more accessible. Initial reports suggested that IRFD of CB-stained gels provided greater sensitivity than traditional densitometric detection (50, 51). Using CB R250, in-gel IRFD was reported to detect as little as 2 ng of protein in-gel, with a LDR of about an order of magnitude (2 to 20 ng, or 10 to 100 ng in separate gels), beyond which the fluorescent response saturated into the μg range (51). Using the G250 dye variant, it was determined that CB-IRFD of 2D gels detected ∼3 times as many proteins as densitometric imaging, and a comparable number of proteins as seen by SR (50). This study also concluded that CB-IRFD yielded a significantly higher signal to background ratio (S/BG) than SR, providing initial evidence that CB-IRFD may be superior to SR in some aspects of stain performance (50).Despite this initial evidence of the viability of CB-IRF as an in-gel protein detection method, a detailed characterization of this technology has not yet been reported. Here a more thorough, quantitative characterization of CB-IRFD is described, establishing its lowest limit of detection (LLD), IPV, and LDR in comparison to SR. Finally a wealth of modifications and enhancements of CCB formulations have been reported (8, 12, 21, 24, 26, 29, 40, 41, 5254), and likewise there are many commercially available CCB stain formulations. To date, none of these formulations have been compared quantitatively in terms of their relative performance when detected using IRF. As a general detection method for gel-based proteomics, CB-IRFD was found to provide comparable or even slightly superior performance to SR according to most criteria, including sensitivity and selectivity (50). Furthermore, in terms of LDR, CB-IRFD showed distinct advantages over SR. However, assessing proteomes resolved by 2DE revealed critical distinctions between CB-IRFD and SR in terms of protein quantification versus threshold detection: neither stain could be considered unequivocally superior to the other by all criteria. Nonetheless, IRFD proved the most sensitive method of detecting CB-stained protein in-gel, enabling high sensitivity detection without the need for expensive reagents or even commercial formulations. Overall, CB-IRFD is a viable alternative to SR and other mainstream fluorescent stains, mitigating the high cost of large-scale gel-based proteomic analyses, making high sensitivity gel-based proteomics accessible to all labs. With improvements to CB formulations and/or image acquisition instruments, the performance of this detection technology may be further enhanced.  相似文献   

15.
16.
17.
18.
19.
This paper proposes a novel, automated method for evaluating sets of proteins identified using mass spectrometry. The remaining peptide-spectrum match score distributions of protein sets are compared to an empirical absent peptide-spectrum match score distribution, and a Bayesian non-parametric method reminiscent of the Dirichlet process is presented to accurately perform this comparison. Thus, for a given protein set, the process computes the likelihood that the proteins identified are correctly identified. First, the method is used to evaluate protein sets chosen using different protein-level false discovery rate (FDR) thresholds, assigning each protein set a likelihood. The protein set assigned the highest likelihood is used to choose a non-arbitrary protein-level FDR threshold. Because the method can be used to evaluate any protein identification strategy (and is not limited to mere comparisons of different FDR thresholds), we subsequently use the method to compare and evaluate multiple simple methods for merging peptide evidence over replicate experiments. The general statistical approach can be applied to other types of data (e.g. RNA sequencing) and generalizes to multivariate problems.Mass spectrometry is the predominant tool for characterizing complex protein mixtures. Using mass spectrometry, a heterogeneous protein sample is digested into peptides, which are separated by various features (e.g. retention time and mass-to-charge ratio), and fragmented to produce a large collection of spectra; these fragmentation spectra are matched to peptide sequences, and the peptide-spectrum matches (PSMs)1 are scored (1). PSM scores from different peptide search engines and replicate experiments can be assembled to produce consensus scores for each peptide (2, 3). These peptide search results are then used to identify proteins (4).Inferring the protein content from these fragment ion spectra is difficult, and statistical methods have been developed with that goal. Protein identification methods (58) rank proteins according to the probability of their being present in the sample. Complementary target-decoy methods evaluate the proteins identified by searching fragmentation spectra against proteins that might be present (targets) and proteins that are absent (decoys). An identified target protein counts as a correct identification (increasing the estimated sensitivity), whereas each identified decoy protein counts as an incorrect identification (lowering the estimated specificity).Current target-decoy methods estimate the protein-level false discovery rate (FDR) for a set of identified proteins (9, 10), as well as the sensitivity at a particular arbitrary FDR threshold (11); however, these methods have two main shortcomings.First, current methods introduce strong statistical biases, which can be conservative (10) or optimistic (12) in different settings. These biases make current approaches unreliable for comparing different identification methods, because they implicitly favor methods that use similar assumptions. Automated evaluation tools that can be run without user-defined parameters are necessary in order to compare and improve existing analysis tools (13).Second, existing evaluation methods do not produce a single quality measure; instead, they estimate both FDR and sensitivity (which is estimated using the “absolute sensitivity,” which treats all targets as present and counts them as true identifications). For data sets with known protein contents (e.g. the protein standard data set considered), the absolute sensitivity is estimable; however, for more complex data sets with unknown contents, the measurement indicates the relative sensitivity. Even if one ignores statistical biases, there is currently no method for choosing a non-arbitrary FDR threshold, and it is currently not possible to decide which protein set is superior—one with a lower sensitivity and stricter FDR, or another with a higher sensitivity and less stringent FDR. The former is currently favored but might result in significant information loss. Arbitrary thresholds have significant effects: in the yeast data analyzed, 1% and 5% FDR thresholds, respectively, yielded 1289 and 1570 identified protein groups (grouping is discussed in the supplementary “Methods” section). Even with such a simple data set, this subtle change results in 281 more target identifications, of which unknown subsets of 66 (0.05 × 1570 − 0.01 × 1289 ≈ 66) are expected to be false identifications and 215 are expected to be true identifications (281 − 66 = 215).Here we introduce the non-parametric cutout index (npCI), a novel, automated target-decoy method that can be used to compute a single robust and parameter-free quality measure for protein identifications. Our method does not require prior expertise in order for the user to select parameters or run the computation. The npCI employs target-decoy analysis at the PSM level, where its assumptions are more applicable (4). Rather than use assumptions to model PSM scores matching present proteins, our method remains agnostic to the characteristics of present proteins and analyzes PSMs not explained by the identified proteins. If the correct present set of proteins is known, then the distribution of remaining, unexplained PSM scores resembles the decoy distribution (14). We extend this idea and present a general graphical framework to evaluate a set of protein identifications by computing the likelihood that the remaining PSMs and decoy PSMs are drawn from the same distribution (Fig. 1).Open in a separate windowFig. 1.Schematic for non-parametric probabilistic evaluation of identified proteins. Under the supposition that the identified protein set (blue) is present, all peptides matching those proteins (also blue) might be present and have an unknown score distribution. When the correct set of proteins is identified, the remaining peptides (i.e. those not matching any shaded proteins in this figure) have a score distribution resembling that of absent peptides. Thus, the similarity of the remaining peptide score distribution (red dashed line) to the absent peptide score distribution (black solid line) determines the quality of the identified proteins.Existing non-parametric statistical tests evaluating the similarity between two collections of samples (e.g. Kolmogorov–Smirnov test, used in Ref. 14, and the Wilcoxon signed rank test) were inadequate because infrequent but significant outliers (e.g. high-scoring PSMs) are largely ignored by these methods. Likewise, information-theoretic measures, such as the Kullback–Leibler divergence, were inadequate because they require a prior on the smoothing parameter that weighs more smoothing and higher similarity against less smoothing and lower similarity (a problem reminiscent of the original compromise between sensitivity and FDR); without the application of such a prior, the optimal Kullback–Leibler divergence occurs with infinite smoothing, which will make any distributions equal, rendering them completely uninformative and thus making it impossible to distinguish one identified protein set from another. For these reasons, we derived a novel, Bayesian, non-parametric process to compute the likelihood that two continuous collections are drawn from the same distribution. It can be used to provide a robust and efficient evaluation of discoveries.  相似文献   

20.
A decoding algorithm is tested that mechanistically models the progressive alignments that arise as the mRNA moves past the rRNA tail during translation elongation. Each of these alignments provides an opportunity for hybridization between the single-stranded, -terminal nucleotides of the 16S rRNA and the spatially accessible window of mRNA sequence, from which a free energy value can be calculated. Using this algorithm we show that a periodic, energetic pattern of frequency 1/3 is revealed. This periodic signal exists in the majority of coding regions of eubacterial genes, but not in the non-coding regions encoding the 16S and 23S rRNAs. Signal analysis reveals that the population of coding regions of each bacterial species has a mean phase that is correlated in a statistically significant way with species () content. These results suggest that the periodic signal could function as a synchronization signal for the maintenance of reading frame and that codon usage provides a mechanism for manipulation of signal phase.[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号