共查询到20条相似文献,搜索用时 0 毫秒
1.
Smilde AK Timmerman ME Hendriks MM Jansen JJ Hoefsloot HC 《Briefings in bioinformatics》2012,13(5):524-535
In functional genomics it is more rule than exception that experimental designs are used to generate the data. The samples of the resulting data sets are thus organized according to this design and for each sample many biochemical compounds are measured, e.g. typically thousands of gene-expressions or hundreds of metabolites. This results in high-dimensional data sets with an underlying experimental design. Several methods have recently become available for analyzing such data while utilizing the underlying design. We review these methods by putting them in a unifying and general framework to facilitate understanding the (dis-)similarities between the methods. The biological question dictates which method to use and the framework allows for building new methods to accommodate a range of such biological questions. The framework is built on well known fixed-effect ANOVA models and subsequent dimension reduction. We present the framework both in matrix algebra as well as in more insightful geometrical terms. We show the workings of the different special cases of our framework with a real-life metabolomics example from nutritional research and a gene-expression example from the field of virology. 相似文献
2.
MOTIVATION: To identify accurately protein function on a proteome-wide scale requires integrating data within and between high-throughput experiments. High-throughput proteomic datasets often have high rates of errors and thus yield incomplete and contradictory information. In this study, we develop a simple statistical framework using Bayes' law to interpret such data and combine information from different high-throughput experiments. In order to illustrate our approach we apply it to two protein complex purification datasets. RESULTS: Our approach shows how to use high-throughput data to calculate accurately the probability that two proteins are part of the same complex. Importantly, our approach does not need a reference set of verified protein interactions to determine false positive and false negative error rates of protein association. We also demonstrate how to combine information from two separate protein purification datasets into a combined dataset that has greater coverage and accuracy than either dataset alone. In addition, we also provide a technique for estimating the total number of proteins which can be detected using a particular experimental technique. AVAILABILITY: A suite of simple programs to accomplish some of the above tasks is available at www.unm.edu/~compbio/software/DatasetAssess 相似文献
3.
4.
5.
In this paper, we present a multi-agent framework for data mining in electromyography. This application, based on a web interface, provides a set of functionalities allowing to manipulate 1000 medical cases and more than 25,000 neurological tests stored in a medical database. The aim is to extract medical information using data mining algorithms and to supply a knowledge base with pertinent information. The multi-agent platform gives the possibility to distribute the data management process between several autonomous entities. This framework provides a parallel and flexible data manipulation. 相似文献
6.
Background
Biclustering algorithm can find a number of co-expressed genes under a set of experimental conditions. Recently, differential co-expression bicluster mining has been used to infer the reasonable patterns in two microarray datasets, such as, normal and cancer cells.Methods
In this paper, we propose an algorithm, DECluster, to mine Differential co-Expression biCluster in two discretized microarray datasets. Firstly, DECluster produces the differential co-expressed genes from each pair of samples in two microarray datasets, and constructs a differential weighted undirected sample–sample relational graph. Secondly, the differential biclusters are generated in the above differential weighted undirected sample–sample relational graph. In order to mine maximal differential co-expression biclusters efficiently, we design several pruning techniques for generating maximal biclusters without candidate maintenance.Results
The experimental results show that our algorithm is more efficient than existing methods. The performance of DECluster is evaluated by empirical p-value and gene ontology, the results show that our algorithm can find more statistically significant and biological differential co-expression biclusters than other algorithms.Conclusions
Our proposed algorithm can find more statistically significant and biological biclusters in two microarray datasets than the other two algorithms. 相似文献7.
Rita Gupta Anna Stincone Philipp Antczak Sarah Durant Roy Bicknell Andreas Bikfalvi Francesco Falciani 《BMC systems biology》2011,5(1):52
Background
Reverse engineering in systems biology entails inference of gene regulatory networks from observational data. This data typically include gene expression measurements of wild type and mutant cells in response to a given stimulus. It has been shown that when more than one type of experiment is used in the network inference process the accuracy is higher. Therefore the development of generally applicable and effective methodologies that embed multiple sources of information in a single computational framework is a worthwhile objective. 相似文献8.
We develop a statistical framework to study the relationship between chromatin features and gene expression. This can be used
to predict gene expression of protein coding genes, as well as microRNAs. We demonstrate the prediction in a variety of contexts,
focusing particularly on the modENCODE worm datasets. Moreover, our framework reveals the positional contribution around genes
(upstream or downstream) of distinct chromatin features to the overall prediction of expression levels. 相似文献
9.
10.
11.
12.
R Hosur J Peng A Vinayagam U Stelzl J Xu N Perrimon J Bienkowska B Berger 《Genome biology》2012,13(8):R76
ABSTRACT: Improving the quality and coverage of the protein interactome is of tantamount importance for biomedical research, particularly given the various sources of uncertainty in high-throughput techniques. We introduce a structure-based framework, Coev2Net, for computing a single confidence score that addresses both false positive and false negative rates. Coev2Net is easily applied to thousands of binary protein interactions and has superior predictive performance over existing methods. We experimentally validate selected high-confidence predictions in the human MAPK network and show that predicted interfaces are enriched for cancer-related or damaging SNPs. Coev2Net can be downloaded at http://struct2net.csail.mit.edu/ 相似文献
13.
14.
15.
We describe RosettaRemodel, a generalized framework for flexible protein design that provides a versatile and convenient interface to the Rosetta modeling suite. RosettaRemodel employs a unified interface, called a blueprint, which allows detailed control over many aspects of flexible backbone protein design calculations. RosettaRemodel allows the construction and elaboration of customized protocols for a wide range of design problems ranging from loop insertion and deletion, disulfide engineering, domain assembly, loop remodeling, motif grafting, symmetrical units, to de novo structure modeling. 相似文献
16.
Adaptive dynamics describes the evolution of games where the strategies are continuous functions of some parameters. The standard adaptive dynamics framework assumes that the population is homogeneous at any one time. Differential equations point to the direction of the mutant that has maximum payoff against the resident population. The population then moves towards this mutant. The standard adaptive dynamics formulation cannot deal with games in which the payoff is not differentiable. Here we present a generalized framework which can. We assume that the population is not homogeneous but distributed around an average strategy. This approach can describe the long-term dynamics of the Ultimatum Game and also explain the evolution of fairness in a one-parameter Ultimatum Game. 相似文献
17.
18.
Background
Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly.Methods
In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets.Results
The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task.Conclusions
No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.19.
We present a noise robust PCA algorithm which is an extension of the Oja subspace algorithm and allows tuning the noise sensitivity. We derive a loss function which is minimized by this algorithm and interpret it in a noisy PCA setting. Results on the local stability analysis of this algorithm are given and it is shown that the locally stable equilibria are those which minimize the loss function. 相似文献