首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Data mining application to proteomic data from mass spectrometry has gained much interest in recent years. Advances made in proteomics and mass spectrometry have resulted in considerable amount of data that cannot be easily visualized or interpreted. Mass spectral proteomic datasets are typically high dimensional but with small sample size. Consequently, advanced artificial intelligence and machine learning algorithms are increasingly being used for knowledge discovery from such datasets. Their overall goal is to extract useful information that leads to the identification of protein biomarker candidates. Such biomarkers could potentially have diagnostic value as tools for early detection, diagnosis, and prognosis of many diseases. The purpose of this review is to focus on the current trends in mining mass spectral proteomic data. Special emphasis is placed on the critical steps involved in the analysis of surface-enhanced laser desorption/ionization mass spectrometry proteomic data. Examples are drawn from previously published studies and relevant data mining terminology and techniques are exlained.  相似文献   

2.
Multi-omics approaches are novel frameworks that integrate multiple omics datasets generated from the same patients to better understand the molecular and clinical features of cancers. A wide range of emerging omics and multi-view clustering algorithms now provide unprecedented opportunities to further classify cancers into subtypes, improve the survival prediction and therapeutic outcome of these subtypes, and understand key pathophysiological processes through different molecular layers. In this review, we overview the concept and rationale of multi-omics approaches in cancer research. We also introduce recent advances in the development of multi-omics algorithms and integration methods for multiple-layered datasets from cancer patients. Finally, we summarize the latest findings from large-scale multi-omics studies of various cancers and their implications for patient subtyping and drug development.  相似文献   

3.
Challenges and solutions in proteomics   总被引:1,自引:0,他引:1  
The accelerated growth of proteomics data presents both opportunities and challenges. Large-scale proteomic profiling of biological samples such as cells, organelles or biological fluids has led to discovery of numerous key and novel proteins involved in many biological/disease processes including cancers, as well as to the identification of novel disease biomarkers and potential therapeutic targets. While proteomic data analysis has been greatly assisted by the many bioinformatics tools developed in recent years, a careful analysis of the major steps and flow of data in a typical highthroughput analysis reveals a few gaps that still need to be filled to fully realize the value of the data. To facilitate functional and pathway discovery for large-scale proteomic data, we have developed an integrated proteomic expression analysis system, iProXpress, which facilitates protein identification using a comprehensive sequence library and functional interpretation using integrated data. With its modular design, iProXpress complements and can be integrated with other software in a proteomic data analysis pipeline. This novel approach to complex biological questions involves the interrogation of multiple data sources, thereby facilitating hypothesis generation and knowledge discovery from the genomic-scale studies and fostering disease diagnosis and drug development.  相似文献   

4.
The increasing prevalence of infections involving intracellular apicomplexan parasites such as Plasmodium, Toxoplasma, and Cryptosporidium (the causative agents of malaria, toxoplasmosis, and cryptosporidiosis, respectively) represent a significant global healthcare burden. Despite their significance, few treatments are available; a situation that is likely to deteriorate with the emergence of new resistant strains of parasites. To lay the foundation for programs of drug discovery and vaccine development, genome sequences for many of these organisms have been generated, together with large-scale expression and proteomic datasets. Comparative analyses of these datasets are beginning to identify the molecular innovations supporting both conserved processes mediating fundamental roles in parasite survival and persistence, as well as lineage-specific adaptations associated with divergent life-cycle strategies. The challenge is how best to exploit these data to derive insights into parasite virulence and identify those genes representing the most amenable targets. In this review, we outline genomic datasets currently available for apicomplexans and discuss biological insights that have emerged as a consequence of their analysis. Of particular interest are systems-based resources, focusing on areas of metabolism and host invasion that are opening up opportunities for discovering new therapeutic targets.  相似文献   

5.
MOTIVATION: The 'reproducibility' of mass spectrometry proteomic profiling has become an intensely controversial topic. The mere mention of concern over the 'reproducibility' of data generated from any particular platform can lead to the anxiety over the generalizability of its results and its role in the future of discovery proteomics. In this study, we examine the reproducibility of proteomic profiles generated by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS) across multiple data-generation sessions. We analyze the problem in terms of the reproducibility of signals, reproducibility of discriminative features and reproducibility of multivariate classification models on profiles for serum samples from early lung cancer and healthy control subjects. RESULTS: Proteomic profiles in individual data-generation sessions experience within-session variability. We show that combining data from multiple sessions introduces additional (inter-session) noise. While additional noise can affect the discriminative analysis, we show that its average effect on profiles in our study is relatively small. Moreover, for the purposes of prediction on future (previously unseen) data, classifiers trained on multi-session data are able to adapt to inter-session noise and improve their classification accuracy.  相似文献   

6.
Creating protein profiles of tissues and tissue fluids, which contain secreted proteins and peptides released from various cells, is critical for biomarker discovery as well as drug and vaccine target selection. It is extremely difficult to obtain pure samples from tissues or tissue fluids, however, and identification of complex protein mixtures is still a challenge for mass spectrometry analysis. Here, we summarize recent advances in techniques for extracting proteins from tissues for mass spectrometry profiling and imaging. We also introduce a novel technique using a capillary ultrafiltration (CUF) probe to enable in vivo collection of proteins from the tissue microenvironment. The CUF probe technique is compared with existing sampling techniques, including perfusion, saline wash, fine-needle aspiration and microdialysis. In this review, we also highlight quantitative mass spectrometric proteomic approaches with, and without, stable-isotope labels. Advances in quantitative proteomics will significantly improve protein profiling of tissue and tissue fluid samples collected by CUF probes.  相似文献   

7.
Global gel-free proteomic analysis by mass spectrometry has been widely used as an important tool for exploring complex biological systems at the whole genome level. Simultaneous analysis of a large number of protein species is a complicated and challenging task. The challenges exist throughout all stages of a global gel-free proteomic analysis: experimental design, peptide/protein identification, data preprocessing and normalization, and inferential analysis. In addition to various efforts to improve the analytical technologies, statistical methodologies have been applied in all stages of proteomic analyses to help extract relevant information efficiently from large proteomic datasets. In this review, we summarize current applications of statistics in several stages of global gel-free proteomic analysis by mass spectrometry. We discuss the challenges associated with the applications of various statistical tools. Whenever possible, we also propose potential solutions on how to improve the data collection and interpretation for mass-spectrometry-based global proteomic analysis using more sophisticated and/or novel statistical approaches.  相似文献   

8.
The rapidly increasing amount of public data in chemistry and biology provides new opportunities for large-scale data mining for drug discovery. Systematic integration of these heterogeneous sets and provision of algorithms to data mine the integrated sets would permit investigation of complex mechanisms of action of drugs. In this work we integrated and annotated data from public datasets relating to drugs, chemical compounds, protein targets, diseases, side effects and pathways, building a semantic linked network consisting of over 290,000 nodes and 720,000 edges. We developed a statistical model to assess the association of drug target pairs based on their relation with other linked objects. Validation experiments demonstrate the model can correctly identify known direct drug target pairs with high precision. Indirect drug target pairs (for example drugs which change gene expression level) are also identified but not as strongly as direct pairs. We further calculated the association scores for 157 drugs from 10 disease areas against 1683 human targets, and measured their similarity using a [Formula: see text] score matrix. The similarity network indicates that drugs from the same disease area tend to cluster together in ways that are not captured by structural similarity, with several potential new drug pairings being identified. This work thus provides a novel, validated alternative to existing drug target prediction algorithms. The web service is freely available at: http://chem2bio2rdf.org/slap.  相似文献   

9.
High-accuracy proteome maps of human body fluids   总被引:1,自引:0,他引:1  
The proteomes most likely to contain clinically useful disease biomarkers are those of human body fluids. Three recent large-scale proteomic analyses of tears, urine and seminal plasma using the latest mass spectrometric technology will provide useful datasets for biomarker discovery.  相似文献   

10.
11.
Prediction of molecular interaction networks from large-scale datasets in genomics and other omics experiments is an important task in terms of both developing bioinformatics methods and solving biological problems. We have applied a kernel-based network inference method for extracting functionally related genes to the response of nitrogen deprivation in cyanobacteria Anabaena sp. PCC 7120 integrating three heterogeneous datasets: microarray data, phylogenetic profiles, and gene orders on the chromosome. We obtained 1348 predicted genes that are somehow related to known genes in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. While this dataset contained previously known genes related to the nitrogen deprivation condition, it also contained additional genes. Thus, we attempted to select any relevant genes using the constraints of Pfam domains and NtcA-binding sites. We found candidates of nitrogen metabolism-related genes, which are depicted as extensions of existing KEGG pathways. The prediction of functional relationships between proteins rather than functions of individual proteins will thus assist the discovery from the large-scale datasets.  相似文献   

12.
Proteomic technologies have experienced major improvements in recent years. Such advances have facilitated the discovery of potential tumor markers with improved sensitivities and specificities for the diagnosis, prognosis and treatment monitoring of cancer patients. This review will focus on four state-of-the-art proteomic technologies, namely 2D difference gel electrophoresis, MALDI imaging mass spectrometry, electron transfer dissociation mass spectrometry and reverse-phase protein array. The major advancements these techniques have brought about and examples of their applications in cancer biomarker discovery will be presented in this review, so that readers can appreciate the immense progress in proteomic technologies from 1997 to 2008. Finally, a summary will be presented that discusses current hurdles faced by proteomic researchers, such as the wide dynamic range of protein abundance, standardization of protocols and validation of cancer biomarkers, and a 5-year view of potential solutions to such problems will be provided.  相似文献   

13.
BACKGROUND: Recognizing specific protein changes in response to drug administration in humans has the potential for the development of personalized medicine. Such changes can be identified by pharmacoproteomics approach based on proteomic technologies. It can also be helpful in matching a particular target-based therapy to a particular marker in a subgroup of patients, in addition to the profile of genetic polymorphism. Warfarin is a commonly prescribed oral anticoagulant in patients with prosthetic valve disease, venous thromboembolism and stroke. METHODS AND FINDING: We used a combined pharmacogenetics and iTRAQ-coupled LC-MS/MS pharmacoproteomics approach to analyze plasma protein profiles of 53 patients, and identified significantly upregulated level of transthyretin precursor in patients receiving low dose of warfarin but not in those on high dose of warfarin. In addition, real-time RT-PCR, western blotting, human IL-6 ELISA assay were done for the results validation. CONCLUSION: This combined pharmacogenomics and pharmacoproteomics approach may be applied for other target-based therapies, in matching a particular marker in a subgroup of patients, in addition to the profile of genetic polymorphism.  相似文献   

14.
We have developed an integrated suite of algorithms, statistical methods, and computer applications to support large-scale LC-MS-based gel-free shotgun profiling of complex protein mixtures using basic experimental procedures. The programs automatically detect and quantify large numbers of peptide peaks in feature-rich ion mass chromatograms, compensate for spurious fluctuations in peptide signal intensities and retention times, and reliably match related peaks across many different datasets. Application of this toolkit markedly facilitates pattern recognition and biomarker discovery in global comparative proteomic studies, simplifying mechanistic investigation of physiological responses and the detection of proteomic signatures of disease.  相似文献   

15.
MOTIVATION: There is a pressing need for improved proteomic screening methods allowing for earlier diagnosis of disease, systematic monitoring of physiological responses and the uncovering of fundamental mechanisms of drug action. The combined platform of LC-MS (Liquid-Chromatography-Mass-Spectrometry) has shown promise in moving toward a solution in these areas. In this paper we present a technique for discovering differences in protein signal between two classes of samples of LC-MS serum proteomic data without use of tandem mass spectrometry, gels or labeling. This method works on data from a lower-precision MS instrument, the type routinely used by and available to the community at large today. We test our technique on a controlled (spike-in) but realistic (serum biomarker discovery) experiment which is therefore verifiable. We also develop a new method for helping to assess the difficulty of a given spike-in problem. Lastly, we show that the problem of class prediction, sometimes mistaken as a solution to biomarker discovery, is actually a much simpler problem. RESULTS: Using precision-recall curves with experimentally extracted ground truth, we show that (1) our technique has good performance using seven replicates from each class, (2) performance degrades with decreasing number of replicates, (3) the signal that we are teasing out is not trivially available (i.e. the differences are not so large that the task is easy). Lastly, we easily obtain perfect classification results for data in which the problem of extracting differences does not produce absolutely perfect results. This emphasizes the different nature of the two problems and also their relative difficulties. AVAILABILITY: Our data are publicly available as a benchmark for further studies of this nature at http://www.cs.toronto.edu/~jenn/LCMS  相似文献   

16.
Protein and peptide mass analysis and amino acid sequencing by mass spectrometry is widely used for identification and annotation of post-translational modifications (PTMs) in proteins. Modification-specific mass increments, neutral losses or diagnostic fragment ions in peptide mass spectra provide direct evidence for the presence of post-translational modifications, such as phosphorylation, acetylation, methylation or glycosylation. However, the commonly used database search engines are not always practical for exhaustive searches for multiple modifications and concomitant missed proteolytic cleavage sites in large-scale proteomic datasets, since the search space is dramatically expanded. We present a formal definition of the problem of searching databases with tandem mass spectra of peptides that are partially (sub-stoichiometrically) modified. In addition, an improved search algorithm and peptide scoring scheme that includes modification specific ion information from MS/MS spectra was implemented and tested using the Virtual Expert Mass Spectrometrist (VEMS) software. A set of 2825 peptide MS/MS spectra were searched with 16 variable modifications and 6 missed cleavages. The scoring scheme returned a large set of post-translationally modified peptides including precise information on modification type and position. The scoring scheme was able to extract and distinguish the near-isobaric modifications of trimethylation and acetylation of lysine residues based on the presence and absence of diagnostic neutral losses and immonium ions. In addition, the VEMS software contains a range of new features for analysis of mass spectrometry data obtained in large-scale proteomic experiments. Windows binaries are available at http://www.yass.sdu.dk/.  相似文献   

17.
Accurate protein identification in large-scale proteomics experiments relies upon a detailed, accurate protein catalogue, which is derived from predictions of open reading frames based on genome sequence data. Integration of mass spectrometry-based proteomics data with computational proteome predictions from environmental metagenomic sequences has been challenging because of the variable overlap between proteomic datasets and corresponding short-read nucleotide sequence data. In this study, we have benchmarked several strategies for increasing microbial peptide spectral matching in metaproteomic datasets using protein predictions generated from matched metagenomic sequences from the same human fecal samples. Additionally, we investigated the impact of mass spectrometry-based filters (high mass accuracy, delta correlation), and de novo peptide sequencing on the number and robustness of peptide-spectrum assignments in these complex datasets. In summary, we find that high mass accuracy peptide measurements searched against non-assembled reads from DNA sequencing of the same samples significantly increased identifiable proteins without sacrificing accuracy.  相似文献   

18.
Mycobacterium smegmatis is a fast-growing model mycobacterial system that shares many features with the pathogenic Mycobacterium tuberculosis while allowing practical proteomics analysis. With the use of shotgun-style mass spectrometry, we provide a large-scale analysis of the M. smegmatis proteomic response to the anti-tuberculosis (TB) drugs isoniazid, ethambutol, and 5-chloropyrazinamide and elucidate the drugs' systematic effects on mycobacterial proteins. A total of 2550 proteins were identified with approximately 5% false-positive identification rate across 60 experiments, representing approximately 40% of the M. smegmatis proteome ( approximately 6500 proteins). Protein differential expression levels were estimated from the shotgun proteomics data, and 485 proteins showing altered expression levels in response to drugs were identified at a 99% confidence level. Proteomic comparison of anti-TB drug responses shows that translation, cell cycle control, and energy production are down-regulated in all three drug treatments. In contrast, systems related to the drugs' targets, such as lipid, amino acid, and nucleotide metabolism, show specific protein expression changes associated with a particular drug treatment. We identify proteins involved in target pathways for the three drugs and infer putative targets for 5-chloropyrazinamide.  相似文献   

19.
Glycoproteomics, or characterizing glycosylation events at a proteome scale, has seen rapid advances in methods for analyzing glycopeptides by tandem mass spectrometry in recent years. These advances have enabled acquisition of far more comprehensive and large-scale datasets, precipitating an urgent need for improved informatics methods to analyze the resulting data. A new generation of glycoproteomics search methods has recently emerged, using glycan fragmentation to split the identification of a glycopeptide into peptide and glycan components and solve each component separately. In this review, we discuss these new methods and their implications for large-scale glycoproteomics, as well as several outstanding challenges in glycoproteomics data analysis, including validation of glycan assignments and quantitation. Finally, we provide an outlook on the future of glycoproteomics from an informatics perspective, noting the key challenges to achieving widespread and reproducible glycopeptide annotation and quantitation.  相似文献   

20.
Protein phosphorylation events are key regulators of cellular signaling processes. In the era of functional genomics, rational drug design programs demand large-scale high-throughput analysis of signal transduction cascades. Significant improvements in the area of mass spectrometry-based proteomics have provided exciting opportunities for rapid progress toward global protein phosphorylation analysis. This review summarizes several recent advances made in the field of phosphoproteomics with an emphasis placed on mass spectrometry instrumentation, enrichment methods and quantification strategies. In the near future, these technologies will provide a tool that can be used for quantitative investigation of signal transduction pathways to generate new insights into biologic systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号