首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Proteomic studies involve the identification as well as qualitative and quantitative comparison of proteins expressed under different conditions, and elucidation of their properties and functions, usually in a large-scale, high-throughput format. The high dimensionality of data generated from these studies will require the development of improved bioinformatics tools and data-mining approaches for efficient and accurate data analysis of biological specimens from healthy and diseased individuals. Mining large proteomics data sets provides a better understanding of the complexities between the normal and abnormal cell proteome of various biological systems, including environmental hazards, infectious agents (bioterrorism) and cancers. This review will shed light on recent developments in bioinformatics and data-mining approaches, and their limitations when applied to proteomics data sets, in order to strengthen the interdependence between proteomic technologies and bioinformatics tools.  相似文献   

2.
Positron emission tomography (PET) has proved to be a highly successful technique in the qualitative and quantitative exploration of the human brain's neurotransmitter-receptor systems. In recent years, the number of PET radioligands, targeted to different neuroreceptor systems of the human brain, has increased considerably. This development paves the way for a simultaneous analysis of different receptor systems and subsystems in the same individual. The detailed exploration of the versatility of neuroreceptor systems requires novel technical approaches, capable of operating on huge parametric image datasets. An initial step of such explorative data processing and analysis should be the development of novel exploratory data-mining tools to gain insight into the "structure" of complex multi-individual, multi-receptor data sets. For practical reasons, a possible and feasible starting point of multi-receptor research can be the analysis of the pre- and post-synaptic binding sites of the same neurotransmitter. In the present study, we propose an unsupervised, unbiased data-mining tool for this task and demonstrate its usefulness by using quantitative receptor maps, obtained with positron emission tomography, from five healthy subjects on (pre-synaptic) serotonin transporters (5-HTT or SERT) and (post-synaptic) 5-HT(1A) receptors. Major components of the proposed technique include the projection of the input receptor maps to a feature space, the quasi-clustering and classification of projected data (neighbourhood formation), trans-individual analysis of neighbourhood properties (trajectory analysis), and the back-projection of the results of trajectory analysis to normal space (creation of multi-receptor maps). The resulting multi-receptor maps suggest that complex relationships and tendencies in the relationship between pre- and post-synaptic transporter-receptor systems can be revealed and classified by using this method. As an example, we demonstrate the regional correlation of the serotonin transporter-receptor systems. These parameter-specific multi-receptor maps can usefully guide the researchers in their endeavour to formulate models of multi-receptor interactions and changes in the human brain.  相似文献   

3.
Uniformly repeated DNA sequences in genomes known as tandem repeats are one of the most interesting features of many organisms analyzed so far. Among the tandem repeats, microsatellites have attracted many researchers since their associations in several human diseases. The discovery of tandem repeats in the expressed sequence tags (ESTs) or in the cDNA libraries contributed to new ideas and tools for evolutionary studies. With the advent of new biotechnological tools the number of ESTs deposited in databases is rapidly increasing. Therefore, new informative bioinformatics tools are needed to assist the analysis and interpretation of these tandem repeats in ESTs and in other type of DNAs. In the present study we report two new utility tools; Organism Miner and Keyword Finder. Organism Miner utility collects, sorts, splice and provides statistical overview on DNA data files. Keyword Finder analyses all the sequences in the input folder and extracts and collects keywords for each specific organism or the all the organisms, which have the DNA sequence and generates statistical overview. We are currently generating cotton and pepper cDNA libraries and often using the GenBank DNA sequences. Therefore, in this study we used cDNAs and ESTs of cotton and pepper for the demonstrating the use of these two tools. With help of these two utilities we observed that most of ESTs are useful for downstream applications such as mining microsatellites specific to an organ, tissue or development stage. The analyses of ESTs indicated that not only tandem repeats existed in ESTs but also tandem repeats differentially presented in different organ or tissue specific ESTs within and between the species. Utilities and the sample data sets are self-extracting files and freely available from or can be obtained upon request from the corresponding author.  相似文献   

4.
Asymmetric regression is an alternative to conventional linear regression that allows us to model the relationship between predictor variables and the response variable while accommodating skewness. Advantages of asymmetric regression include incorporating realistic ecological patterns observed in data, robustness to model misspecification and less sensitivity to outliers. Bayesian asymmetric regression relies on asymmetric distributions such as the asymmetric Laplace (ALD) or asymmetric normal (AND) in place of the normal distribution used in classic linear regression models. Asymmetric regression concepts can be used for process and parameter components of hierarchical Bayesian models and have a wide range of applications in data analyses. In particular, asymmetric regression allows us to fit more realistic statistical models to skewed data and pairs well with Bayesian inference. We first describe asymmetric regression using the ALD and AND. Second, we show how the ALD and AND can be used for Bayesian quantile and expectile regression for continuous response data. Third, we consider an extension to generalize Bayesian asymmetric regression to survey data consisting of counts of objects. Fourth, we describe a regression model using the ALD, and show that it can be applied to add needed flexibility, resulting in better predictive models compared to Poisson or negative binomial regression. We demonstrate concepts by analyzing a data set consisting of counts of Henslow’s sparrows following prescribed fire and provide annotated computer code to facilitate implementation. Our results suggest Bayesian asymmetric regression is an essential component of a scientist’s statistical toolbox.  相似文献   

5.
Kim SH  Yi SV 《Genetica》2007,131(2):151-156
The underlying relationship between functional variables and sequence evolutionary rates is often assessed by partial correlation analysis. However, this strategy is impeded by the difficulty of conducting meaningful statistical analysis using noisy biological data. A recent study suggested that the partial correlation analysis is misleading when data is noisy and that the principal component regression analysis is a better tool to analyze biological data. In this paper, we evaluate how these two statistical tools (partial correlation and principal component regression) perform when data are noisy. Contrary to the earlier conclusion, we found that these two tools perform comparably in most cases. Furthermore, when there is more than one ‘true’ independent variable, partial correlation analysis delivers a better representation of the data. Employing both tools may provide a more complete and complementary representation of the real data. In this light, and with new analyses, we suggest that protein length and gene dispensability play significant, independent roles in yeast protein evolution. Electronic supplementary material Supplementary material is available in the online version of this article at and is accessible for authorized users.  相似文献   

6.
The Botany Array Resource provides the means for obtaining and archiving microarray data for Arabidopsis thaliana as well as biologist-friendly tools for viewing and mining both our own and other's data, for example, from the AtGenExpress Consortium. All the data produced are publicly available through the web interface of the database at http://bbc.botany.utoronto.ca. The database has been designed in accordance with the Minimum Information About a Microarray Experiment convention -- all expression data are associated with the corresponding experimental details. The database is searchable and it also provides a set of useful and easy-to-use web-based data-mining tools for researchers with sophisticated yet understandable output graphics. These include Expression Browser for performing 'electronic Northerns', Expression Angler for identifying genes that are co-regulated with a gene of interest, and Promomer for identifying potential cis-elements in the promoters of individual or co-regulated genes.  相似文献   

7.
This is the second article in a series, intended as a tutorial to provide the interested reader with an overview of the concepts not covered in part I, such as: the principles of ion-activation methods, the ability of mass-spectrometric methods to interface with various proteomic strategies, analysis techniques, bioinformatics and data interpretation and annotation. Although these are different topics, it is important that a reader has a basic and collective understanding of all of them for an overall appreciation of how to carry out and analyze a proteomic experiment. Different ion-activation methods for MS/MS, such as collision-induced dissociation (including postsource decay) and surface-induced dissociation, electron capture and electron-transfer dissociation, infrared multiphoton and blackbody infrared radiative dissociation have been discussed since they are used in proteomic research. The high dimensionality of data generated from proteomic studies requires an understanding of the underlying analytical procedures used to obtain these data, as well as the development of improved bioinformatics tools and data-mining approaches for efficient and accurate statistical analyses of biological samples from healthy and diseased individuals, in addition to determining the utility of the interpreted data. Currently available strategies for the analysis of the proteome by mass spectrometry, such as those employed for the analysis of substantially purified proteins and complex peptide mixtures, as well as hypothesis-driven strategies, have been elaborated upon. Processing steps prior to the analysis of mass spectrometry data, statistics and the several informatics steps currently used for the analysis of shotgun proteomic experiments, as well as proteomics ontology, are also discussed.  相似文献   

8.

Objectives

Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone.

Methods

In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models.

Results

Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan''s nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear).

Conclusions

Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears.  相似文献   

9.
10.
Inexpensive computational power combined with high-throughput experimental platforms has created a wealth of biological information requiring analytical tools and techniques for interpretation. Graph-theoretic concepts and tools have provided an important foundation for information visualization, integration, and analysis of datasets, but they have often been relegated to background analysis tasks. GT-Miner is designed for visual data analysis and mining operations, interacts with other software, including databases, and works with diverse data types. It facilitates a discovery-oriented approach to data mining wherein exploration of alterations of the data and variations of the visualization is encouraged. The user is presented with a basic iterative process, consisting of loading, visualizing, transforming, and then storing the resultant information. Complex analyses are built-up through repeated iterations and user interactions. The iterative process is optimized by automatic layout following transformations and by maintaining a current selection set of interest for elements modified by the transformations. Multiple visualizations are supported including hierarchical, spring, and force-directed self-organizing layouts. Graphs can be transformed with an extensible set of algorithms or manually with an integral visual editor. GT-Miner is intended to allow easier access to visual data mining for the non-expert.  相似文献   

11.
Proteomic studies involve the identification as well as qualitative and quantitative comparison of proteins expressed under different conditions, and elucidation of their properties and functions, usually in a large-scale, high-throughput format. The high dimensionality of data generated from these studies will require the development of improved bioinformatics tools and data-mining approaches for efficient and accurate data analysis of biological specimens from healthy and diseased individuals. Mining large proteomics data sets provides a better understanding of the complexities between the normal and abnormal cell proteome of various biological systems, including environmental hazards, infectious agents (bioterrorism) and cancers. This review will shed light on recent developments in bioinformatics and data-mining approaches, and their limitations when applied to proteomics data sets, in order to strengthen the interdependence between proteomic technologies and bioinformatics tools.  相似文献   

12.
Protein expression profiling is increasingly being used to discover, validate and characterize biomarkers that can potentially be used for diagnostic purposes and to aid in pharmaceutical development. Correct analysis of data obtained from these experiments requires an understanding of the underlying analytic procedures used to obtain the data, statistical principles underlying high-dimensional data and clinical statistical tools used to determine the utility of the interpreted data. This review summarizes each of these steps, with the goal of providing the nonstatistician proteomics researcher with a working understanding of the various approaches that may be used by statisticians. Emphasis is placed on the process of mining high-dimensional data to identify a specific set of biomarkers that may be used in a diagnostic or other assay setting.  相似文献   

13.
14.
An integrated software system for analyzing ChIP-chip and ChIP-seq data   总被引:1,自引:0,他引:1  
Ji H  Jiang H  Ma W  Johnson DS  Myers RM  Wong WH 《Nature biotechnology》2008,26(11):1293-1300
  相似文献   

15.
Abstract A comprehensive but simple‐to‐use software package called DPS (Data Processing System) has been developed to execute a range of standard numerical analyses and operations used in experimental design, statistics and data mining. This program runs on standard Windows computers. Many of the functions are specific to entomological and other biological research and are not found in standard statistical software. This paper presents applications of DPS to experimental design, statistical analysis and data mining in entomology.  相似文献   

16.

Background  

It is hypothesized that common, complex diseases may be due to complex interactions between genetic and environmental factors, which are difficult to detect in high-dimensional data using traditional statistical approaches. Multifactor Dimensionality Reduction (MDR) is the most commonly used data-mining method to detect epistatic interactions. In all data-mining methods, it is important to consider internal validation procedures to obtain prediction estimates to prevent model over-fitting and reduce potential false positive findings. Currently, MDR utilizes cross-validation for internal validation. In this study, we incorporate the use of a three-way split (3WS) of the data in combination with a post-hoc pruning procedure as an alternative to cross-validation for internal model validation to reduce computation time without impairing performance. We compare the power to detect true disease causing loci using MDR with both 5- and 10-fold cross-validation to MDR with 3WS for a range of single-locus and epistatic disease models. Additionally, we analyze a dataset in HIV immunogenetics to demonstrate the results of the two strategies on real data.  相似文献   

17.

Background and Aims

Proton pump inhibitors (PPIs) have been associated with adverse clinical outcomes amongst clopidogrel users after an acute coronary syndrome. Recent pre-clinical results suggest that this risk might extend to subjects without any prior history of cardiovascular disease. We explore this potential risk in the general population via data-mining approaches.

Methods

Using a novel approach for mining clinical data for pharmacovigilance, we queried over 16 million clinical documents on 2.9 million individuals to examine whether PPI usage was associated with cardiovascular risk in the general population.

Results

In multiple data sources, we found gastroesophageal reflux disease (GERD) patients exposed to PPIs to have a 1.16 fold increased association (95% CI 1.09–1.24) with myocardial infarction (MI). Survival analysis in a prospective cohort found a two-fold (HR = 2.00; 95% CI 1.07–3.78; P = 0.031) increase in association with cardiovascular mortality. We found that this association exists regardless of clopidogrel use. We also found that H2 blockers, an alternate treatment for GERD, were not associated with increased cardiovascular risk; had they been in place, such pharmacovigilance algorithms could have flagged this risk as early as the year 2000.

Conclusions

Consistent with our pre-clinical findings that PPIs may adversely impact vascular function, our data-mining study supports the association of PPI exposure with risk for MI in the general population. These data provide an example of how a combination of experimental studies and data-mining approaches can be applied to prioritize drug safety signals for further investigation.  相似文献   

18.
The intermediary steps between a biological hypothesis, concretized in the input data, and meaningful results, validated using biological experiments, commonly employ bioinformatics tools. Starting with storage of the data and ending with a statistical analysis of the significance of the results, every step in a bioinformatics analysis has been intensively studied and the resulting methods and models patented. This review summarizes the bioinformatics patents that have been developed mainly for the study of genes, and points out the universal applicability of bioinformatics methods to other related studies such as RNA interference. More specifically, we overview the steps undertaken in the majority of bioinformatics analyses, highlighting, for each, various approaches that have been developed to reveal details from different perspectives. First we consider data warehousing, the first task that has to be performed efficiently, optimizing the structure of the database, in order to facilitate both the subsequent steps and the retrieval of information. Next, we review data mining, which occupies the central part of most bioinformatics analyses, presenting patents concerning differential expression, unsupervised and supervised learning. Last, we discuss how networks of interactions of genes or other players in the cell may be created, which help draw biological conclusions and have been described in several patents.  相似文献   

19.
sMOL Explorer is a 2D ligand-based computational tool that provides three major functionalities: data management, information retrieval and extraction and statistical analysis and data mining through Web interface. With sMOL Explorer, users can create personal databases by adding each small molecule via a drawing interface or uploading the data files from internal and external projects into the sMOL database. Then, the database can be browsed and queried with textual and structural similarity search. The molecule can also be submitted to search against external public databases including PubChem, KEGG, DrugBank and eMolecules. Moreover, users can easily access a variety of data mining tools from Weka and R packages to perform analysis including (1) finding the frequent substructure, (2) clustering the molecular fingerprints, (3) identifying and removing irrelevant attributes from the data and (4) building the classification model of biological activity. AVAILABILITY: sMOL Explorer is an Open Source project and is freely available to all interested users at http://www.biotec.or.th/ISL/SMOL/.  相似文献   

20.
Lo SL  You T  Lin Q  Joshi SB  Chung MC  Hew CL 《Proteomics》2006,6(6):1758-1769
In the field of proteomics, the increasing difficulty to unify the data format, due to the different platforms/instrumentation and laboratory documentation systems, greatly hinders experimental data verification, exchange, and comparison. Therefore, it is essential to establish standard formats for every necessary aspect of proteomics data. One of the recently published data models is the proteomics experiment data repository [Taylor, C. F., Paton, N. W., Garwood, K. L., Kirby, P. D. et al., Nat. Biotechnol. 2003, 21, 247-254]. Compliant with this format, we developed the systematic proteomics laboratory analysis and storage hub (SPLASH) database system as an informatics infrastructure to support proteomics studies. It consists of three modules and provides proteomics researchers a common platform to store, manage, search, analyze, and exchange their data. (i) Data maintenance includes experimental data entry and update, uploading of experimental results in batch mode, and data exchange in the original PEDRo format. (ii) The data search module provides several means to search the database, to view either the protein information or the differential expression display by clicking on a gel image. (iii) The data mining module contains tools that perform biochemical pathway, statistics-associated gene ontology, and other comparative analyses for all the sample sets to interpret its biological meaning. These features make SPLASH a practical and powerful tool for the proteomics community.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号