首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 473 毫秒
1.
This is the second article in a series, intended as a tutorial to provide the interested reader with an overview of the concepts not covered in part I, such as: the principles of ion-activation methods, the ability of mass-spectrometric methods to interface with various proteomic strategies, analysis techniques, bioinformatics and data interpretation and annotation. Although these are different topics, it is important that a reader has a basic and collective understanding of all of them for an overall appreciation of how to carry out and analyze a proteomic experiment. Different ion-activation methods for MS/MS, such as collision-induced dissociation (including postsource decay) and surface-induced dissociation, electron capture and electron-transfer dissociation, infrared multiphoton and blackbody infrared radiative dissociation have been discussed since they are used in proteomic research. The high dimensionality of data generated from proteomic studies requires an understanding of the underlying analytical procedures used to obtain these data, as well as the development of improved bioinformatics tools and data-mining approaches for efficient and accurate statistical analyses of biological samples from healthy and diseased individuals, in addition to determining the utility of the interpreted data. Currently available strategies for the analysis of the proteome by mass spectrometry, such as those employed for the analysis of substantially purified proteins and complex peptide mixtures, as well as hypothesis-driven strategies, have been elaborated upon. Processing steps prior to the analysis of mass spectrometry data, statistics and the several informatics steps currently used for the analysis of shotgun proteomic experiments, as well as proteomics ontology, are also discussed.  相似文献   

2.
The search and validation of novel disease biomarkers requires the complementary power of professional study planning and execution, modern profiling technologies and related bioinformatics tools for data analysis and interpretation. Biomarkers have considerable impact on the care of patients and are urgently needed for advancing diagnostics, prognostics and treatment of disease. This survey article highlights emerging bioinformatics methods for biomarker discovery in clinical metabolomics, focusing on the problem of data preprocessing and consolidation, the data-driven search, verification, prioritization and biological interpretation of putative metabolic candidate biomarkers in disease. In particular, data mining tools suitable for the application to omic data gathered from most frequently-used type of experimental designs, such as case-control or longitudinal biomarker cohort studies, are reviewed and case examples of selected discovery steps are delineated in more detail. This review demonstrates that clinical bioinformatics has evolved into an essential element of biomarker discovery, translating new innovations and successes in profiling technologies and bioinformatics to clinical application.  相似文献   

3.
4.

Background

Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics.

Results

We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling.

Conclusion

The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field.  相似文献   

5.
Keith P. Lewis 《Oikos》2004,104(2):305-315
Ecologists rely heavily upon statistics to make inferences concerning ecological phenomena and to make management recommendations. It is therefore important to use statistical tests that are most appropriate for a given data-set. However, inappropriate statistical tests are often used in the analysis of studies with categorical data (i.e. count data or binary data). Since many types of statistical tests have been used in artificial nests studies, a review and comparison of these tests provides an opportunity to demonstrate the importance of choosing the most appropriate statistical approach for conceptual reasons as well as type I and type II errors.
Artificial nests have routinely been used to study the influences of habitat fragmentation, and habitat edges on nest predation. I review the variety of statistical tests used to analyze artificial nest data within the framework of the generalized linear model and argue that logistic regression is the most appropriate and flexible statistical test for analyzing binary data-sets. Using artificial nest data from my own studies and an independent data set from the medical literature as examples, I tested equivalent data using a variety of statistical methods. I then compared the p-values and the statistical power of these tests. Results vary greatly among statistical methods. Methods inappropriate for analyzing binary data often fail to yield significant results even when differences between study groups appear large, while logistic regression finds these differences statistically significant. Statistical power is is 2–3 times higher for logistic regression than for other tests. I recommend that logistic regression be used to analyze artificial nest data and other data-sets with binary data.  相似文献   

6.
Phenotypic characterization of individual cells provides crucial insights into intercellular heterogeneity and enables access to information that is unavailable from ensemble averaged, bulk cell analyses. Single-cell studies have attracted significant interest in recent years and spurred the development of a variety of commercially available and research-grade technologies. To quantify cell-to-cell variability of cell populations, we have developed an experimental platform for real-time measurements of oxygen consumption (OC) kinetics at the single-cell level. Unique challenges inherent to these single-cell measurements arise, and no existing data analysis methodology is available to address them. Here we present a data processing and analysis method that addresses challenges encountered with this unique type of data in order to extract biologically relevant information. We applied the method to analyze OC profiles obtained with single cells of two different cell lines derived from metaplastic and dysplastic human Barrett's esophageal epithelium. In terms of method development, three main challenges were considered for this heterogeneous dynamic system: (i) high levels of noise, (ii) the lack of a priori knowledge of single-cell dynamics, and (iii) the role of intercellular variability within and across cell types. Several strategies and solutions to address each of these three challenges are presented. The features such as slopes, intercepts, breakpoint or change-point were extracted for every OC profile and compared across individual cells and cell types. The results demonstrated that the extracted features facilitated exposition of subtle differences between individual cells and their responses to cell-cell interactions. With minor modifications, this method can be used to process and analyze data from other acquisition and experimental modalities at the single-cell level, providing a valuable statistical framework for single-cell analysis.  相似文献   

7.
As projects progress from pilot studies with few simple variables and small samples, the research process as a whole becomes qualitatively more complex and subject to an array of contamination by errors and mistakes. Data usually undergo a series of manipulations (e.g., recording, computer entry, transmission) prior to final statistical analysis. The process, then, consists of numerous operations only ending with eventual statistical analysis and write-up. We present a means of estimating the impact of process error in the same terms as psychometric reliability and discuss the implications for reducing the impact of errors on overall data quality.  相似文献   

8.
Bioinformatics analysis of alternative splicing   总被引:5,自引:0,他引:5  
Over the past few years, the analysis of alternative splicing using bioinformatics has emerged as an important new field, and has significantly changed our view of genome function. One exciting front has been the analysis of microarray data to measure alternative splicing genome-wide. Pioneering studies of both human and mouse data have produced algorithms for discerning evidence of alternative splicing and clustering genes and samples by their alternative splicing patterns. Moreover, these data indicate the presence of alternative splice forms in up to 80 per cent of human genes. Comparative genomics studies in both mammals and insects have demonstrated that alternative splicing can in some cases be predicted directly from comparisons of genome sequences, based on heightened sequence conservation and exon length. Such studies have also provided new insights into the connection between alternative splicing and a variety of evolutionary processes such as Alu-based exonisation, exon creation and loss. A number of groups have used a combination of bioinformatics, comparative genomics and experimental validation to identify new motifs for splice regulatory factors, analyse the balance of factors that regulate alternative splicing, and propose a new mechanism for regulation based on the interaction of alternative splicing and nonsense-mediated decay. Bioinformatics studies of the functional impact of alternative splicing have revealed a wide range of regulatory mechanisms, from NAGNAG sites that add a single amino acid; to short peptide segments that can play surprisingly complex roles in switching protein conformation and function (as in the Piccolo C2A domain); to events that entirely remove a specific protein interaction domain or membrane anchoring domain. Common to many bioinformatics studies is a new emphasis on graph representations of alternative splicing structures, which have many advantages for analysis.  相似文献   

9.
10.
Modern technologies have rapidly transformed biology into a data-intensive discipline. In addition to the enormous amounts of existing experimental data in the literature, every new study can produce a large amount of new data, resulting in novel ideas and more publications. In order to understand a biological process as completely as possible, scientists should be able to combine and analyze all such information. Not only molecular biology and bioinformatics, but all the other domains of biology including plant biology, require tools and technologies that enable experts to capture knowledge within distributed and heterogeneous sources of information. Ontologies have proven to be one of the most-useful means of constructing and formalizing expert knowledge. The key feature of an ontology is that it represents a computer-interpretable model of a particular subject area. This article outlines the importance of ontologies for systems biology, data integration and information analyses, as illustrated through the example of reactive oxygen species (ROS) signaling networks in plants.  相似文献   

11.
The interest in performing gene-environment interaction studies has seen a significant increase with the increase of advanced molecular genetics techniques. Practically, it became possible to investigate the role of environmental factors in disease risk and hence to investigate their role as genetic effect modifiers. The understanding that genetics is important in the uptake and metabolism of toxic substances is an example of how genetic profiles can modify important environmental risk factors to disease. Several rationales exist to set up gene-environment interaction studies and the technical challenges related to these studies-when the number of environmental or genetic risk factors is relatively small-has been described before. In the post-genomic era, it is now possible to study thousands of genes and their interaction with the environment. This brings along a whole range of new challenges and opportunities. Despite a continuing effort in developing efficient methods and optimal bioinformatics infrastructures to deal with the available wealth of data, the challenge remains how to best present and analyze genome-wide environmental interaction (GWEI) studies involving multiple genetic and environmental factors. Since GWEIs are performed at the intersection of statistical genetics, bioinformatics and epidemiology, usually similar problems need to be dealt with as for genome-wide association gene-gene interaction studies. However, additional complexities need to be considered which are typical for large-scale epidemiological studies, but are also related to "joining" two heterogeneous types of data in explaining complex disease trait variation or for prediction purposes.  相似文献   

12.
转录因子结合位点生物信息学研究进展   总被引:7,自引:0,他引:7  
侯琳  钱敏平  朱云平  邓明华 《遗传》2009,31(4):365-373
By using genome in situ hybridization (GISH) on root somatic chromosomes of allotetraploid derived from the cross Gossypium arboreum × G. bickii with genomic DNA (gDNA) of G. bickii as a probe, two sets of chromosomes, consisting of 26 chromosomes each, were easily distinguished from each other by their distinctive hybridization signals. GISH analysis directly proved that the hybrid G.arboreum×G. bickii is an allotetraploid amphiploid. The karyotype formula of the species was 2n = 4x = 52 = 46m (4sat) + 6sm (4sat). We identified four pairs of satellites with two pairs in each sub-genome. FISH analysis using 45S rDNA as a probe showed that the cross G. arboreum×G. bickii contained 14 NORs. At least five pairs of chromosomes in the G sub-genome showed double hybridization (red and blue) in their long arms, which indicates that chromatin introgression from the A sub-genome had occurred.  相似文献   

13.
Global gel-free proteomic analysis by mass spectrometry has been widely used as an important tool for exploring complex biological systems at the whole genome level. Simultaneous analysis of a large number of protein species is a complicated and challenging task. The challenges exist throughout all stages of a global gel-free proteomic analysis: experimental design, peptide/protein identification, data preprocessing and normalization, and inferential analysis. In addition to various efforts to improve the analytical technologies, statistical methodologies have been applied in all stages of proteomic analyses to help extract relevant information efficiently from large proteomic datasets. In this review, we summarize current applications of statistics in several stages of global gel-free proteomic analysis by mass spectrometry. We discuss the challenges associated with the applications of various statistical tools. Whenever possible, we also propose potential solutions on how to improve the data collection and interpretation for mass-spectrometry-based global proteomic analysis using more sophisticated and/or novel statistical approaches.  相似文献   

14.
The post-genomic era presents many new challenges for the field of bioinformatics. Novel computational approaches are now being developed to handle the large, complex and noisy datasets produced by high throughput technologies. Objective evaluation of these methods is essential (i) to assure high quality, (ii) to identify strong and weak points of the algorithms, (iii) to measure the improvements introduced by new methods and (iv) to enable non-specialists to choose an appropriate tool. Here, we discuss the development of formal benchmarks, designed to represent the current problems encountered in the bioinformatics field. We consider several criteria for building good benchmarks and the advantages to be gained when they are used intelligently. To illustrate these principles, we present a more detailed discussion of benchmarks for multiple alignments of protein sequences. As in many other domains, significant progress has been achieved in the multiple alignment field and the datasets have become progressively more challenging as the existing algorithms have evolved. Finally, we propose directions for future developments that will ensure that the bioinformatics benchmarks correspond to the challenges posed by the high throughput data.  相似文献   

15.
AJ Thompson  M Abu  DP Hanger 《Amino acids》2012,43(3):1075-1085
Proteomic technologies have matured to a level enabling accurate and reproducible quantitation of peptides and proteins from complex biological matrices. Analysis of samples as diverse as assembled protein complexes, whole cell lysates or sub-cellular proteomes from cell cultures, and direct analysis of animal and human tissues and fluids demonstrate the incredible versatility of the fundamental nature of the technique that forms the basis of most proteomic applications today (mass spectrometry). Determining the mass of biomolecules and their fragments or related products with high accuracy can convey a highly specific assay for detection and identification. Importantly, ion currents representative of these specifically identified analytes can be accurately quantified with the correct application of smart isobaric tagging chemistries, heavy and light isotopically derivatised samples or standards, or by careful application of workflows to compare unlabelled samples in so-called 'label-free' and targeted selected reaction monitoring experiments. In terms of exploring biology, a myriad of protein changes and modifications are being increasingly probed and quantified, including diverse chemical changes from relatively decisive modifications such as protein splicing and truncation, to more transient dynamic modifications such as phosphorylation, acetylation and ubiquitination. Proteomic workflows can be complex beasts and several key considerations to ensure effective applications have been outlined in the recent literature. The past year has witnessed the publication of several excellent reviews that thoroughly describe the fundamental principles underlying the state of the art. This review further elaborates on specific critical issues introduced by these publications and raises other important unaddressed considerations and new developments that directly impact on the effectiveness of proteomic technologies, in particular for, but not necessarily exclusive to peptide-centric experiments. These factors are discussed both in terms of qualitative analyses, including dynamic range and sampling issues, and developments to improve the translation of peptide fragmentation data into peptide and protein identities, as well as quantitative analyses, including data normalisation and the utility of ontology or functional annotation, the effects of modified peptides, and considered experimental design to facilitate the use of robust statistical methods.  相似文献   

16.
Background: Functional genomics employs dozens of OMICs technologies to explore the functions of DNA, RNA and protein regulators in gene regulation processes. Despite each of these technologies being powerful tools on their own, like the parable of blind men and an elephant, any one single technology has a limited ability to depict the complex regulatory system. Integrative OMICS approaches have emerged and become an important area in biology and medicine. It provides a precise and effective way to study gene regulations.Results: This article reviews current popular OMICs technologies, OMICs data integration strategies, and bioinformatics tools used for multi-dimensional data integration. We highlight the advantages of these methods, particularly in elucidating molecular basis of biological regulatory mechanisms. Conclusions: To better understand the complexity of biological processes, we need powerful bioinformatics tools to integrate these OMICs data. Integrating multi-dimensional OMICs data will generate novel insights into system-level gene regulations and serves as a foundation for further hypothesis-driven research.  相似文献   

17.
Many biologists believe that data analysis expertise lags behind the capacity for producing high-throughput data. One view within the bioinformatics community is that biological scientists need to develop algorithmic skills to meet the demands of the new technologies. In this article, we argue that the broader concept of inferential literacy, which includes understanding of data characteristics, experimental design and statistical analysis, in addition to computation, more adequately encompasses what is needed for efficient progress in high-throughput biology.  相似文献   

18.
陈铭 《生物信息学》2022,20(2):75-83
随着生物数据测量技术的不断发展,生物数据的类型、内容、复杂度不断增加,生物信息学已迈入大数据时代。面对大数据时代多模态、多层次、高维度、非线性的复杂生物数据,生物信息学需要发展相应的方法和技术进行有效整合生物信息学研究与应用。本文对大数据时代整合生物信息学所涉及的数据整合、方法整合、系统整合及相关问题进行梳理和探讨。  相似文献   

19.
Mass spectrometry is a technique widely employed for the identification and characterization of proteins. The role of bioinformatics is fundamental for the elaboration of mass spectrometry data due to the amount of data that this technique can produce. To process data efficiently, new software packages and algorithms are continuously being developed to improve protein identification and characterization in terms of high-throughput and statistical accuracy. However, many limitations exist concerning bioinformatics spectral data elaboration. This review aims to critically cover the recent and future developments of new bioinformatics approaches in mass spectrometry data analysis for proteomics studies.  相似文献   

20.
Proteomic studies involve the identification as well as qualitative and quantitative comparison of proteins expressed under different conditions, and elucidation of their properties and functions, usually in a large-scale, high-throughput format. The high dimensionality of data generated from these studies will require the development of improved bioinformatics tools and data-mining approaches for efficient and accurate data analysis of biological specimens from healthy and diseased individuals. Mining large proteomics data sets provides a better understanding of the complexities between the normal and abnormal cell proteome of various biological systems, including environmental hazards, infectious agents (bioterrorism) and cancers. This review will shed light on recent developments in bioinformatics and data-mining approaches, and their limitations when applied to proteomics data sets, in order to strengthen the interdependence between proteomic technologies and bioinformatics tools.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号