首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Pittelkow Y  Wilson SR 《Biometrics》2005,61(2):630-2; discussion 632-4
This note is in response to Wouters et al. (2003, Biometrics 59, 1131-1139) who compared three methods for exploring gene expression data. Contrary to their summary that principal component analysis is not very informative, we show that it is possible to determine principal component analyses that are useful for exploratory analysis of microarray data. We also present another biplot representation, the GE-biplot (Gene Expression biplot), that is a useful method for exploring gene expression data with the major advantage of being able to aid interpretation of both the samples and the genes relative to each other.  相似文献   

2.
SUMMARY: The large amount of data produced by proteomics experiments requires effective bioinformatics tools for the integration of data management and data analysis. Here we introduce a suite of tools developed at Vanderbilt University to support production proteomics. We present the Backup Utility Service tool for automated instrument file backup and the ScanSifter tool for data conversion. We also describe a queuing system to coordinate identification pipelines and the File Collector tool for batch copying analytical results. These tools are individually useful but collectively reinforce each other. They are particularly valuable for proteomics core facilities or research institutions that need to manage multiple mass spectrometers. With minor changes, they could support other types of biomolecular resource facilities.  相似文献   

3.
Summary This note is in response to Wouters et al. (2003, Biometrics 59, 1131–1139) who compared three methods for exploring gene expression data. Contrary to their summary that principal component analysis is not very informative, we show that it is possible to determine principal component analyses that are useful for exploratory analysis of microarray data. We also present another biplot representation, the GE‐biplot (Gene Expression biplot), that is a useful method for exploring gene expression data with the major advantage of being able to aid interpretation of both the samples and the genes relative to each other.  相似文献   

4.
We present a new approach to site-specifically biotinylate protein in a cell-free protein synthesis system with puromycin-containing small molecules. With this new method, biotinylated proteins were generated from the DNA templates in a matter of hours, making it useful for protein microarray generation. We also validated that the method is compatible with other high-throughput cloning/proteomics methods.  相似文献   

5.
6.
The quest to understand biological systems requires further attention of the scientific community to the challenges faced in proteomics. In fact the complexity of the proteome reaches uncountable orders of magnitude. This means that significant technical and data‐analytic innovations will be needed for the full understanding of biology. Current state of art MS is probably our best choice for studying protein complexity and exploring new ways to use MS and MS derived data should be given higher priority. We present here a brief overview of visualization and statistical analysis strategies for quantitative peptide values on an individual protein basis. These analysis strategies can help pinpoint protein modifications, splice, and genomic variants of biological relevance. We demonstrate the application of these data analysis strategies using a bottom‐up proteomics dataset obtained in a drug profiling experiment. Furthermore, we have also observed that the presented methods are useful for studying peptide distributions from clinical samples from a large number of individuals. We expect that the presented data analysis strategy will be useful in the future to define functional protein variants in biological model systems and disease studies. Therefore robust software implementing these strategies is urgently needed.  相似文献   

7.
The objective of proteomics is to get an overview of the proteins expressed at a given point in time in a given tissue and to identify the connection to the biochemical status of that tissue. Therefore sample throughput and analysis time are important issues in proteomics. The concept of proteomics is to encircle the identity of proteins of interest. However, the overall relation between proteins must also be explained. Classical proteomics consist of separation and characterization, based on two-dimensional electrophoresis, trypsin digestion, mass spectrometry and database searching. Characterization includes labor intensive work in order to manage, handle and analyze data. The field of classical proteomics should therefore be extended to also include handling of large datasets in an objective way. The separation obtained by two-dimensional electrophoresis and mass spectrometry gives rise to huge amount of data. We present a multivariate approach to the handling of data in proteomics with the advantage that protein patterns can be spotted at an early stage and consequently the proteins selected for sequencing can be selected intelligently. These methods can also be applied to other data generating protein analysis methods like mass spectrometry and near infrared spectroscopy and examples of application to these techniques are also presented. Multivariate data analysis can unravel complicated data structures and may thereby relieve the characterization phase in classical proteomics. Traditionally statistical methods are not suitable for analysis of the huge amounts of data, where the number of variables exceed the number of objects. Multivariate data analysis, on the other hand, may uncover the hidden structures present in these data. This study takes its starting point in the field of classical proteomics and shows how multivariate data analysis can lead to faster ways of finding interesting proteins. Multivariate analysis has shown interesting results as a supplement to classical proteomics and added a new dimension to the field of proteomics.  相似文献   

8.
Abstract New methods for performing quantitative proteome analyses based on differential labeling protocols or label-free techniques are reported in the literature on an almost monthly basis. In parallel, a correspondingly vast number of software tools for the analysis of quantitative proteomics data has also been described in the literature and produced by private companies. In this article we focus on the review of some of the most popular techniques in the field and present a critical appraisal of several software packages available to process and analyze the data produced. We also describe the importance of community standards to support the wide range of software, which may assist researchers in the analysis of data using different platforms and protocols. It is intended that this review will serve bench scientists both as a useful reference and a guide to the selection and use of different pipelines to perform quantitative proteomics data analysis. We have produced a web-based tool ( http://www.proteosuite.org/?q=other_resources ) to help researchers find appropriate software for their local instrumentation, available file formats, and quantitative methodology.  相似文献   

9.
Most proteomics experiments make use of 'high throughput' technologies such as 2-DE, MS or protein arrays to measure simultaneously the expression levels of thousands of proteins. Such experiments yield large, high-dimensional data sets which usually reflect not only the biological but also technical and experimental factors. Statistical tools are essential for evaluating these data and preventing false conclusions. Here, an overview is given of some typical statistical tools for proteomics experiments. In particular, we present methods for data preprocessing (e.g. calibration, missing values estimation and outlier detection), comparison of protein expression in different groups (e.g. detection of differentially expressed proteins or classification of new observations) as well as the detection of dependencies between proteins (e.g. protein clusters or networks). We also discuss questions of sample size planning for some of these methods.  相似文献   

10.
差异蛋白质组学的研究进展   总被引:10,自引:0,他引:10  
孙言伟  姜颖  贺福初 《生命科学》2005,17(2):137-140
差异蛋白质组是蛋白质组学研究的一个主要内容,其核心在于寻找某种特定臣寸素引起样本之间蛋白质组的差异,揭示并验证蛋白质组在生理或病理过程中的变化。进一步对蛋白质组差异信息分析后,理论上可以推断造成这种变化的原因。因此,对于临床上肿瘤预诊、药物靶标寻找、细胞调控分子的鉴别等有着极大的实际意义。差异蛋白质组研究要求可靠性和可重复性。因此,对于样本处理要求较高,激光微切割技术和高丰度蛋白去除技术的应用优化了样本处理方法。目前差异蛋白质组的主要研究方法仍是2-DE分离和MS鉴定联合应用,基于2-DE的2-DDIGE方法弥补了2-DE的弱点,更适用于差异蛋白质组研究。除2-DE技术外的其他几种技术手段,如多维液相色谱分离技术、ICAT技术、蛋白芯片技术等差异蛋白质组学研究技术可以作为2-DE技术的补充,甚至或替代技术。  相似文献   

11.
In quantitative proteomics work, the differences in expression of many separate proteins are routinely examined to test for significant differences between treatments. This leads to the multiple hypothesis testing problem: when many separate tests are performed many will be significant by chance and be false positive results. Statistical methods such as the false discovery rate method that deal with this problem have been disseminated for more than one decade. However a survey of proteomics journals shows that such tests are not widely implemented in one commonly used technique, quantitative proteomics using two-dimensional electrophoresis. We outline a selection of multiple hypothesis testing methods, including some that are well known and some lesser known, and present a simple strategy for their use by the experimental scientist in quantitative proteomics work generally. The strategy focuses on the desirability of simultaneous use of several different methods, the choice and emphasis dependent on research priorities and the results in hand. This approach is demonstrated using case scenarios with experimental and simulated model data.  相似文献   

12.
13.
14.
The design and analysis of experiments using gene expression microarrays is a topic of considerable current research, and work is beginning to appear on the analysis of proteomics and metabolomics data by mass spectrometry and NMR spectroscopy. The literature in this area is evolving rapidly, and commercial software for analysis of array or proteomics data is rarely up to date, and is essentially nonexistent for metabolomics data. In this paper, I review some of the issues that should concern any biologists planning to use such high-throughput biological assay data in an experimental investigation. Technical details are kept to a minimum, and may be found in the referenced literature, as well as in the many excellent papers which space limitations prevent my describing. There are usually a number of viable options for design and analysis of such experiments, but unfortunately, there are even more non-viable ones that have been used even in the published literature. This is an area in which up-to-date knowledge of the literature is indispensable for efficient and effective design and analysis of these experiments. In general, we concentrate on relatively simple analyses, often focusing on identifying differentially expressed genes and the comparable issues in mass spectrometry and NMR spectroscopy (consistent differences in peak heights or areas for example). Complex multivariate and pattern recognition methods also need much attention, but the issues we describe in this paper must be dealt with first. The literature on analysis of proteomics and metabolomics data is as yet sparse, so the main focus of this paper will be on methods devised for analysis of gene expression data that generalize to proteomics and metabolomics, with some specific comments near the end on analysis of metabolomics data by mass spectrometry and NMR spectroscopy.  相似文献   

15.
Wang J  Li C  Wang E  Wang X 《PloS one》2011,6(1):e14449
Accurately predicting the localization of proteins is of paramount importance in the quest to determine their respective functions within the cellular compartment. Because of the continuous and rapid progress in the fields of genomics and proteomics, more data are available now than ever before. Coincidentally, data mining methods been developed and refined in order to handle this experimental windfall, thus allowing the scientific community to quantitatively address long-standing questions such as that of protein localization. Here, we develop a frequent pattern tree (FPT) approach to generate a minimum set of rules (mFPT) for predicting protein localization. We acquire a series of rules according to the features of yeast genomic data. The mFPT prediction accuracy is benchmarked against other commonly used methods such as Bayesian networks and logistic regression under various statistical measures. Our results show that mFPT gave better performance than other approaches in predicting protein localization. Meanwhile, setting 0.65 as the minimum hit-rate, we obtained 138 proteins that mFPT predicted differently than the simple naive bayesian method (SNB). In our analysis of these 138 proteins, we present novel predictions for the location for 17 proteins, which currently do not have any defined localization. These predictions can serve as putative annotations and should provide preliminary clues for experimentalists. We also compared our predictions against the eukaryotic subcellular localization database and related predictions by others on protein localization. Our method is quite generalized and can thus be applied to discover the underlying rules for protein-protein interactions, genomic interactions, and structure-function relationships, as well as those of other fields of research.  相似文献   

16.
Techniques for analyzing genome-wide expression profiles, such as the microarray technique and next-generation sequencers, have been developed. While these techniques can provide a lot of information about gene expression, selection of genes of interest is complicated because of excessive gene expression data. Thus, many researchers use statistical methods or fold change as screening tools for finding gene sets whose expression is altered between groups, which may result in the loss of important information. In the present study, we aimed to establish a combined method for selecting genes of interest with a small magnitude of alteration in gene expression by coupling with proteome analysis. We used hypercholesterolemic rats to examine the effects of a crude herbal drug on gene expression and proteome profiles. We could not select genes of interest by using standard methods. However, by coupling with proteome analysis, we found several effects of the crude herbal drug on gene expression. Our results suggest that this method would be useful in selecting gene sets with expressions that do not show a large magnitude of alteration.  相似文献   

17.
18.
MOTIVATION: Experimental techniques in proteomics have seen rapid development over the last few years. Volume and complexity of the data have both been growing at a similar rate. Accordingly, data management and analysis are one of the major challenges in proteomics. Flexible algorithms are required to handle changing experimental setups and to assist in developing and validating new methods. In order to facilitate these studies, it would be desirable to have a flexible 'toolbox' of versatile and user-friendly applications allowing for rapid construction of computational workflows in proteomics. RESULTS: We describe a set of tools for proteomics data analysis-TOPP, The OpenMS Proteomics Pipeline. TOPP provides a set of computational tools which can be easily combined into analysis pipelines even by non-experts and can be used in proteomics workflows. These applications range from useful utilities (file format conversion, peak picking) over wrapper applications for known applications (e.g. Mascot) to completely new algorithmic techniques for data reduction and data analysis. We anticipate that TOPP will greatly facilitate rapid prototyping of proteomics data evaluation pipelines. As such, we describe the basic concepts and the current abilities of TOPP and illustrate these concepts in the context of two example applications: the identification of peptides from a raw dataset through database search and the complex analysis of a standard addition experiment for the absolute quantitation of biomarkers. The latter example demonstrates TOPP's ability to construct flexible analysis pipelines in support of complex experimental setups. AVAILABILITY: The TOPP components are available as open-source software under the lesser GNU public license (LGPL). Source code is available from the project website at www.OpenMS.de  相似文献   

19.
Normalized spectral index quantification was recently presented as an accurate method of label‐free quantitation, which improved spectral counting by incorporating the intensities of peptide MS/MS fragment ions into the calculation of protein abundance. We present SINQ, a tool implementing this method within the framework of existing analysis software, our freely available central proteomics facilities pipeline (CPFP). We demonstrate, using data sets of protein standards acquired on a variety of mass spectrometers, that SINQ can rapidly provide useful estimates of the absolute quantity of proteins present in a medium‐complexity sample. In addition, relative quantitation of standard proteins spiked into a complex lysate background and run without pre‐fractionation produces accurate results at amounts above 1 fmol on column. We compare quantitation performance to various precursor intensity‐ and identification‐based methods, including the normalized spectral abundance factor (NSAF), exponentially modified protein abundance index (emPAI), MaxQuant, and Progenesis LC‐MS. We anticipate that the SINQ tool will be a useful asset for core facilities and individual laboratories that wish to produce quantitative MS data, but lack the necessary manpower to routinely support more complicated software workflows. SINQ is freely available to obtain and use as part of the central proteomics facilities pipeline, which is released under an open‐source license.  相似文献   

20.
Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted variation. Several of these methods rely on factor analysis to infer the unwanted variation from the data. A central problem with this approach is the difficulty in discerning the unwanted variation from the biological variation that is of interest to the researcher. We present a new method, intended for use in differential expression studies, that attempts to overcome this problem by restricting the factor analysis to negative control genes. Negative control genes are genes known a priori not to be differentially expressed with respect to the biological factor of interest. Variation in the expression levels of these genes can therefore be assumed to be unwanted variation. We name this method "Remove Unwanted Variation, 2-step" (RUV-2). We discuss various techniques for assessing the performance of an adjustment method and compare the performance of RUV-2 with that of other commonly used adjustment methods such as Combat and Surrogate Variable Analysis (SVA). We present several example studies, each concerning genes differentially expressed with respect to gender in the brain and find that RUV-2 performs as well or better than other methods. Finally, we discuss the possibility of adapting RUV-2 for use in studies not concerned with differential expression and conclude that there may be promise but substantial challenges remain.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号