共查询到20条相似文献,搜索用时 0 毫秒
1.
Most biochemical reactions in a cell are regulated by highly specialized proteins, which are the prime mediators of the cellular phenotype. Therefore the identification, quantitation and characterization of all proteins in a cell are of utmost importance to understand the molecular processes that mediate cellular physiology. With the advent of robust and reliable mass spectrometers that are able to analyze complex protein mixtures within a reasonable timeframe, the systematic analysis of all proteins in a cell becomes feasible. Besides the ongoing improvements of analytical hardware, standardized methods to analyze and study all proteins have to be developed that allow the generation of testable new hypothesis based on the enormous pre-existing amount of biological information. Here we discuss current strategies on how to gather, filter and analyze proteomic data sates using available software packages. 相似文献
2.
3.
4.
Chanchal Kumar 《FEBS letters》2009,583(11):1703-1712
Proteomics has made tremendous progress, attaining throughput and comprehensiveness so far only seen in genomics technologies. The consequent avalanche of proteome level data poses great analytical challenges for downstream interpretation. We review bioinformatic analysis of qualitative and quantitative proteomic data, focusing on current and emerging paradigms employed for functional analysis, data mining and knowledge discovery from high resolution quantitative mass spectrometric data. Many bioinformatics tools developed for microarrays can be reused in proteomics, however, the uniquely quantitative nature of proteomics data also offers entirely novel analysis possibilities, which directly suggest and illuminate biological mechanisms. 相似文献
5.
Most proteomics experiments make use of 'high throughput' technologies such as 2-DE, MS or protein arrays to measure simultaneously the expression levels of thousands of proteins. Such experiments yield large, high-dimensional data sets which usually reflect not only the biological but also technical and experimental factors. Statistical tools are essential for evaluating these data and preventing false conclusions. Here, an overview is given of some typical statistical tools for proteomics experiments. In particular, we present methods for data preprocessing (e.g. calibration, missing values estimation and outlier detection), comparison of protein expression in different groups (e.g. detection of differentially expressed proteins or classification of new observations) as well as the detection of dependencies between proteins (e.g. protein clusters or networks). We also discuss questions of sample size planning for some of these methods. 相似文献
6.
There are many data mining techniques for processing and general learning of multivariate data. However, we believe the wavelet transformation and latent variable projection method are particularly useful for spectroscopic and chromatographic data. Projection based methods are designed to handle hugely multivariate nature of such data effectively. For the actual analysis of the data we have used latent variable projection methods such as principal component analysis (PCA) and partial least squares projection to latent structures based discriminant analysis (PLS-DA) to analyze the raw data presented to the participants of the First Duke Proteomics Data Mining Conference. PCA was used to solve problem #1 (clustering problem) and the PLS-DA was used to solve problem #2 (classification problem). The idea of internal and external cross-validation was used to validate the model obtained from the classification analysis. The simple two-component PLS-DA model obtained from the analysis performed well. The model has completely separated the two groups from all the data. The same model applied on two-thirds of the data showed good performance by external validation with independent test set of remaining 13 specimens obtained by setting aside the spectra of every third specimen (accuracy of 85%). 相似文献
7.
Epigenetic changes caused by DNA methylation and histone modifications play important roles in the regulation of various cellular processes and development. Recent discoveries of 5-methylcytosine (5mC) oxidation derivatives including 5-hydroxymethylcytosine (5hmC), 5-formylcytsine (5fC) and 5-carboxycytosine (5caC) in mammalian genome further expand our understanding of the epigenetic regulation. Analysis of DNA modification patterns relies increasingly on sequencing-based profiling methods. A number of different approaches have been established to map the DNA epigenomes with single-base resolution, as represented by the bisulfite-based methods, such as classical bisulfite sequencing (BS-seq), TAB-seq (TET-assisted bisulfite sequencing), oxBS-seq (oxidative bisulfite sequencing) and etc. These methods have been used to generate base-resolution maps of 5mC and its oxidation derivatives in genomic samples. The focus of this review will be to discuss the chemical methodologies that have been developed to detect the cytosine derivatives in the genomic DNA. 相似文献
8.
Background
Many trypanosomatid protozoa are important human or animal pathogens. The well defined morphology and precisely choreographed division of trypanosomatid cells makes morphological analysis a powerful tool for analyzing the effect of mutations, chemical insults and changes between lifecycle stages. High-throughput image analysis of micrographs has the potential to accelerate collection of quantitative morphological data. Trypanosomatid cells have two large DNA-containing organelles, the kinetoplast (mitochondrial DNA) and nucleus, which provide useful markers for morphometric analysis; however they need to be accurately identified and often lie in close proximity. This presents a technical challenge. Accurate identification and quantitation of the DNA content of these organelles is a central requirement of any automated analysis method. 相似文献9.
Malinowska A Kistowski M Bakun M Rubel T Tkaczyk M Mierzejewska J Dadlez M 《Journal of Proteomics》2012,75(13):4062-4073
Mass spectrometry-based global proteomics experiments generate large sets of data that can be converted into useful information only with an appropriate statistical approach. We present Diffprot - a software tool for statistical analysis of MS-derived quantitative data. With implemented resampling-based statistical test and local variance estimate, Diffprot allows to draw significant results from small scale experiments and effectively eliminates false positive results. To demonstrate the advantages of this software, we performed two spike-in tests with complex biological matrices, one label-free and one based on iTRAQ quantification; in addition, we performed an iTRAQ experiment on bacterial samples. In the spike-in tests, protein ratios were estimated and were in good agreement with theoretical values; statistical significance was assigned to spiked proteins and single or no false positive results were obtained with Diffprot. We compared the performance of Diffprot with other statistical tests - widely used t-test and non-parametric Wilcoxon test. In contrast to Diffprot, both generated many false positive hits in the spike-in experiment. This proved the superiority of the resampling-based method in terms of specificity, making Diffprot a rational choice for small scale high-throughput experiments, when the need to control the false positive rate is particularly pressing. 相似文献
10.
Koikkalainen J Pölönen H Mattila J van Gils M Soininen H Lötjönen J;Alzheimer's Disease Neuroimaging Initiative 《PloS one》2012,7(2):e31112
Diagnosis of Alzheimer's disease is based on the results of neuropsychological tests and available supporting biomarkers such as the results of imaging studies. The results of the tests and the values of biomarkers are dependent on the nuisance features, such as age and gender. In order to improve diagnostic power, the effects of the nuisance features have to be removed from the data. In this paper, four types of interactions between classification features and nuisance features were identified. Three methods were tested to remove these interactions from the classification data. In stratified analysis, a homogeneous subgroup was generated from a training set. Data correction method utilized linear regression model to remove the effects of nuisance features from data. The third method was a combination of these two methods. The methods were tested using all the baseline data from the Alzheimer's Disease Neuroimaging Initiative database in two classification studies: classifying control subjects from Alzheimer's disease patients and discriminating stable and progressive mild cognitive impairment subjects. The results show that both stratified analysis and data correction are able to statistically significantly improve the classification accuracy of several neuropsychological tests and imaging biomarkers. The improvements were especially large for the classification of stable and progressive mild cognitive impairment subjects, where the best improvements observed were 6% units. The data correction method gave better results for imaging biomarkers, whereas stratified analysis worked well with the neuropsychological tests. In conclusion, the study shows that the excess variability caused by nuisance features should be removed from the data to improve the classification accuracy, and therefore, the reliability of diagnosis making. 相似文献
11.
James M. Carpenter 《Zoologica scripta》1999,28(1-2):251-260
Principles and methods of simultaneous analysis in cladistics are reviewed, and the first, preliminary, analysis of combined molecular and morphological data on higher level relationships in Hymenoptera is presented to exemplify these principles. The morphological data from Ronquist et al . (1999) matrix, derived from the character diagnoses of the phylogenetic tree of Rasnitsyn (1988) , are combined with new molecular data for representatives of 10 superfamilies of Hymenoptera by means of optimization alignment. The resulting cladogram supports Apocrita and Aculeata as groups, and the superfamly Chrysidoidea, but not Chalcidoidea, Evanioidea, Vespoidea and Apoidea. 相似文献
12.
Lee JS Ma YB Choi KS Park SY Baek SH Park YM Zu K Zhang H Ip C Kim YH Park EM 《Preparative biochemistry & biotechnology》2006,36(1):37-64
Generation of a monomethylated selenium metabolite is critical for the anticancer activity of selenium. Because of its strong nucleophilicity, the metabolite can react directly with protein thiols to cause redox modification. Here, we report a neural network-based analysis to identify potential selenium targets. A reactive thiol specific reagent, BIAM, was used to monitor thiol proteome changes on 2D gel. We constructed a dynamic model and evaluated the relative importance of proteins mediating the cellular responses to selenium. Information from this study will provide new clues to unravel mechanisms of anticancer action of selenium. High impact selenium targets could also serve as biomarkers to gauge the efficacy of selenium chemoprevention. 相似文献
13.
In this work, the application of a multivariate curve resolution procedure based on alternating least squares optimization (MCR-ALS) for the analysis of data from DNA microarrays is proposed. For this purpose, simulated and publicly available experimental data sets have been analyzed. Application of MCR-ALS, a method that operates without the use of any training set, has enabled the resolution of the relevant information about different cancer lines classification using a set of few components; each of these defined by a sample and a pure gene expression profile. From resolved sample profiles, a classification of samples according to their origin is proposed. From the resolved pure gene expression profiles, a set of over- or underexpressed genes that could be related to the development of cancer diseases has been selected. Advantages of the MCR-ALS procedure in relation to other previously proposed procedures such as principal component analysis are discussed. 相似文献
14.
Pathway analysis of microarray data evaluates gene expression profiles of a priori defined biological pathways in association with a phenotype of interest. We propose a unified pathway-analysis method that can be used for diverse phenotypes including binary, multiclass, continuous, count, rate, and censored survival phenotypes. The proposed method also allows covariate adjustments and correlation in the phenotype variable that is encountered in longitudinal, cluster-sampled, and paired designs. These are accomplished by combining the regression-based test statistic for each individual gene in a pathway of interest into a pathway-level test statistic. Applications of the proposed method are illustrated with two real pathway-analysis examples: one evaluating relapse-associated gene expression involving a matched-pair binary phenotype in children with acute lymphoblastic leukemia; and the other investigating gene expression in breast cancer tissues in relation to patients' survival (a censored survival phenotype). Implementations for various phenotypes are available in R. Additionally, an Excel Add-in for a user-friendly interface is currently being developed. 相似文献
15.
Junker J Bielow C Bertsch A Sturm M Reinert K Kohlbacher O 《Journal of proteome research》2012,11(7):3914-3920
Mass spectrometry coupled to high-performance liquid chromatography (HPLC-MS) is evolving more quickly than ever. A wide range of different instrument types and experimental setups are commonly used. Modern instruments acquire huge amounts of data, thus requiring tools for an efficient and automated data analysis. Most existing software for analyzing HPLC-MS data is monolithic and tailored toward a specific application. A more flexible alternative consists of pipeline-based tool kits allowing the construction of custom analysis workflows from small building blocks, e.g., the Trans Proteomics Pipeline (TPP) or The OpenMS Proteomics Pipeline (TOPP). One drawback, however, is the hurdle of setting up complex workflows using command line tools. We present TOPPAS, The OpenMS Proteomics Pipeline ASsistant, a graphical user interface (GUI) for rapid composition of HPLC-MS analysis workflows. Workflow construction reduces to simple drag-and-drop of analysis tools and adding connections in between. Integration of external tools into these workflows is possible as well. Once workflows have been developed, they can be deployed in other workflow management systems or batch processing systems in a fully automated fashion. The implementation is portable and has been tested under Windows, Mac OS X, and Linux. TOPPAS is open-source software and available free of charge at http://www.OpenMS.de/TOPPAS . 相似文献
16.
Improved spectral resolution in cosy 1H NMR spectra of proteins via double quantum filtering 总被引:78,自引:0,他引:78
Thierry Buchou Jan Mester Jack-Michel Renoir Etienne-Emile Baulieu 《Biochemical and biophysical research communications》1983,117(2):479-485
A double quantum filter is inserted into a two-dimensional correlated (COSY) 1H NMR experiment to obtain phase-sensitive spectra in which both cross peak and diagonal peak multiplets have anti-phase fine structure, and in which the cross peaks and the major contribution to the diagonal peaks have absorption lineshapes in both dimensions. The elimination of the dispersive character of the diagonal peaks in phase-sensitive, double quantum-filtered COSY spectra allows identification of cross peaks lying immediately adjacent to the diagonal, which represents a significant improvement over the conventional COSY experiment. 相似文献
17.
The human Plasma Proteome Project (PPP) is a large-scale collaboration between many laboratories. One of the most demanding tasks in the PPP involved the analysis of very large amounts of raw MS/MS data produced by the participants. The main approach for managing this task was letting the participants analyze their own data and submit the results to the central PPP repository as lists of identified proteins and peptides. To complement this distributed approach, we also performed centralized analysis of the raw MS/MS data provided by the participants. Due to the data redundancy inherent in such a project, centralized analysis has the potential to reduce the computational effort by reducing redundancy before the analysis. Centralized analysis can also unify the process and take advantage of data sharing among laboratories to improve protein identification and validation. The process we employed included removing low-quality spectra, clustering spectra by mutual similarity, and applying uniform peptide and protein identification procedures. To demonstrate the process, we analyzed 5.28 million MS/MS spectra derived by eight laboratories from tryptic peptides of serum and plasma proteins. 相似文献
18.
Hao Y Merkoulovitch A Vlasblom J Pu S Turinsky AL Roudeva D Turner B Greenblatt J Wodak SJ 《Bioinformatics (Oxford, England)》2011,27(6):883-884
MOTIVATION: Protein interaction networks contain a wealth of biological information, but their large size often hinders cross-organism comparisons. We present OrthoNets, a Cytoscape plugin that displays protein-protein interaction (PPI) networks from two organisms simultaneously, highlighting orthology relationships and aggregating several types of biomedical annotations. OrthoNets also allows PPI networks derived from experiments to be overlaid on networks extracted from public databases, supporting the identification and verification of new interactors. Any newly identified PPIs can be validated by checking whether their orthologs interact in another organism. AVAILABILITY: OrthoNets is freely available at http://wodaklab.org/orthonets/. 相似文献
19.
Identification of genes differentially expressed across multiple conditions has become an important statistical problem in analyzing large-scale microarray data. Many statistical methods have been developed to address the challenging problem. Therefore, an extensive comparison among these statistical methods is extremely important for experimental scientists to choose a valid method for their data analysis. In this study, we conducted simulation studies to compare six statistical methods: the Bonferroni (B-) procedure, the Benjamini and Hochberg (BH-) procedure, the Local false discovery rate (Localfdr) method, the Optimal Discovery Procedure (ODP), the Ranking Analysis of F-statistics (RAF), and the Significant Analysis of Microarray data (SAM) in identifying differentially expressed genes. We demonstrated that the strength of treatment effect, the sample size, proportion of differentially expressed genes and variance of gene expression will significantly affect the performance of different methods. The simulated results show that ODP exhibits an extremely high power in indentifying differentially expressed genes, but significantly underestimates the False Discovery Rate (FDR) in all different data scenarios. The SAM has poor performance when the sample size is small, but is among the best-performing methods when the sample size is large. The B-procedure is stringent and thus has a low power in all data scenarios. Localfdr and RAF show comparable statistical behaviors with the BH-procedure with favorable power and conservativeness of FDR estimation. RAF performs the best when proportion of differentially expressed genes is small and treatment effect is weak, but Localfdr is better than RAF when proportion of differentially expressed genes is large. 相似文献
20.
Detection and validation of non-synonymous coding SNPs from orthogonal analysis of shotgun proteomics data 总被引:1,自引:0,他引:1
Bunger MK Cargile BJ Sevinsky JR Deyanova E Yates NA Hendrickson RC Stephenson JL 《Journal of proteome research》2007,6(6):2331-2340
Orthogonal analysis of amino acid substitutions as a result of SNPs in existing proteomic datasets provides a critical foundation for the emerging field of population-based proteomics. Large-scale proteomics datasets, derived from shotgun tandem MS analysis of complex cellular protein mixtures, contain many unassigned spectra that may correspond to alternate alleles coded by SNPs. The purpose of this work was to identify tandem MS spectra in LC-MS/MS shotgun proteomics datasets that may represent coding nonsynonymous SNPs (nsSNP). To this end, we generated a tryptic peptide database created from allelic information found in NCBI's dbSNP. We searched this database with tandem MS spectra of tryptic peptides from DU4475 breast tumor cells that had been fractioned by pI in the first-dimension and reverse-phase LC in the second dimension. In all we identified 629 nsSNPs, of which 36 were of alternate SNP alleles not found in the reference NCBI or IPI protein databases. Searches for SNP-peptides carry a high risk of false positives due both to mass shifts caused by modifications and because of multiple representations of the same peptide within the genome. In this work, false positives were filtered using a novel peptide pI prediction algorithm and characterized using a decoy database developed by random substitution of similarly sized reference peptides. Secondary validation by sequencing of corresponding genomic DNA confirmed the presence of the predicted SNP in 8 of 10 SNP-peptides. This work highlights that the usefulness of interpreting unassigned spectra as polymorphisms is highly reliant on the ability to detect and filter false positives. 相似文献