期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Generic framework for high-dimensional fixed-effects ANOVA

Smilde AK Timmerman ME Hendriks MM Jansen JJ Hoefsloot HC 《Briefings in bioinformatics》2012,13(5):524-535

In functional genomics it is more rule than exception that experimental designs are used to generate the data. The samples of the resulting data sets are thus organized according to this design and for each sample many biochemical compounds are measured, e.g. typically thousands of gene-expressions or hundreds of metabolites. This results in high-dimensional data sets with an underlying experimental design. Several methods have recently become available for analyzing such data while utilizing the underlying design. We review these methods by putting them in a unifying and general framework to facilitate understanding the (dis-)similarities between the methods. The biological question dictates which method to use and the framework allows for building new methods to accommodate a range of such biological questions. The framework is built on well known fixed-effect ANOVA models and subsequent dimension reduction. We present the framework both in matrix algebra as well as in more insightful geometrical terms. We show the workings of the different special cases of our framework with a real-life metabolomics example from nutritional research and a gene-expression example from the field of virology. 相似文献

2.

A statistical framework for combining and interpreting proteomic datasets

Gilchrist MA Salter LA Wagner A 《Bioinformatics (Oxford, England)》2004,20(5):689-700

MOTIVATION: To identify accurately protein function on a proteome-wide scale requires integrating data within and between high-throughput experiments. High-throughput proteomic datasets often have high rates of errors and thus yield incomplete and contradictory information. In this study, we develop a simple statistical framework using Bayes' law to interpret such data and combine information from different high-throughput experiments. In order to illustrate our approach we apply it to two protein complex purification datasets. RESULTS: Our approach shows how to use high-throughput data to calculate accurately the probability that two proteins are part of the same complex. Importantly, our approach does not need a reference set of verified protein interactions to determine false positive and false negative error rates of protein association. We also demonstrate how to combine information from two separate protein purification datasets into a combined dataset that has greater coverage and accuracy than either dataset alone. In addition, we also provide a technique for estimating the total number of proteins which can be detected using a particular experimental technique. AVAILABILITY: A suite of simple programs to accomplish some of the above tasks is available at www.unm.edu/~compbio/software/DatasetAssess 相似文献

3.

A generalized framework for network component analysis 总被引：1，自引：0，他引：1

Boscolo R Sabatti C Liao JC Roychowdhury VP 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2005,2(4):289-301

相似文献

4.

A ridge-based framework for segmentation of 3D electron microscopy datasets

Antonio Martinez-Sanchez Inmaculada Garcia Jose-Jesus Fernandez 《Journal of structural biology》2013,181(1):61-70

相似文献

5.

A knowledge-driven agent-centred framework for data mining in EMG

Balter J Labarre-Vila A Ziébelin D Garbay C 《Comptes rendus biologies》2002,325(4):375-382

In this paper, we present a multi-agent framework for data mining in electromyography. This application, based on a web interface, provides a set of functionalities allowing to manipulate 1000 medical cases and more than 25,000 neurological tests stored in a medical database. The aim is to extract medical information using data mining algorithms and to supply a knowledge base with pertinent information. The multi-agent platform gives the possibility to distribute the data management process between several autonomous entities. This framework provides a parallel and flexible data manipulation. 相似文献

6.

Efficient mining differential co-expression biclusters in microarray datasets

Miao Wang Xuequn Shang Xiaoyuan Li Wenbin Liu Zhanhuai Li 《Gene》2013

Background

Biclustering algorithm can find a number of co-expressed genes under a set of experimental conditions. Recently, differential co-expression bicluster mining has been used to infer the reasonable patterns in two microarray datasets, such as, normal and cancer cells.

Methods

In this paper, we propose an algorithm, DECluster, to mine Differential co-Expression biCluster in two discretized microarray datasets. Firstly, DECluster produces the differential co-expressed genes from each pair of samples in two microarray datasets, and constructs a differential weighted undirected sample–sample relational graph. Secondly, the differential biclusters are generated in the above differential weighted undirected sample–sample relational graph. In order to mine maximal differential co-expression biclusters efficiently, we design several pruning techniques for generating maximal biclusters without candidate maintenance.

Results

The experimental results show that our algorithm is more efficient than existing methods. The performance of DECluster is evaluated by empirical p-value and gene ontology, the results show that our algorithm can find more statistically significant and biological differential co-expression biclusters than other algorithms.

Conclusions

Our proposed algorithm can find more statistically significant and biological biclusters in two microarray datasets than the other two algorithms. 相似文献

7.

A computational framework for gene regulatory network inference that combines multiple methods and datasets

Rita Gupta Anna Stincone Philipp Antczak Sarah Durant Roy Bicknell Andreas Bikfalvi Francesco Falciani 《BMC systems biology》2011,5(1):52

Background

Reverse engineering in systems biology entails inference of gene regulatory networks from observational data. This data typically include gene expression measurements of wild type and mutant cells in response to a given stimulus. It has been shown that when more than one type of experiment is used in the network inference process the accuracy is higher. Therefore the development of generally applicable and effective methodologies that embed multiple sources of information in a single computational framework is a worthwhile objective. 相似文献

8.

A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets

Cheng C Yan KK Yip KY Rozowsky J Alexander R Shou C Gerstein M 《Genome biology》2011,12(2):R15

We develop a statistical framework to study the relationship between chromatin features and gene expression. This can be used to predict gene expression of protein coding genes, as well as microRNAs. We demonstrate the prediction in a variety of contexts, focusing particularly on the modENCODE worm datasets. Moreover, our framework reveals the positional contribution around genes (upstream or downstream) of distinct chromatin features to the overall prediction of expression levels. 相似文献

9.

Generalized random set framework for functional enrichment analysis using primary genomics datasets

Freudenberg JM Sivaganesan S Phatak M Shinde K Medvedovic M 《Bioinformatics (Oxford, England)》2011,27(1):70-77

相似文献

10.

ChemmineR: a compound mining framework for R

Cao Y Charisi A Cheng LC Jiang T Girke T 《Bioinformatics (Oxford, England)》2008,24(15):1733-1734

相似文献

11.

A computational framework for the inheritance pattern of genomic imprinting for complex traits

Wang C Wang Z Prows DR Wu R 《Briefings in bioinformatics》2012,13(1):34-45

相似文献

12.

Coev2Net: a computational framework for boosting confidence in high-throughput protein-protein interaction datasets

R Hosur J Peng A Vinayagam U Stelzl J Xu N Perrimon J Bienkowska B Berger 《Genome biology》2012,13(8):R76

ABSTRACT: Improving the quality and coverage of the protein interactome is of tantamount importance for biomedical research, particularly given the various sources of uncertainty in high-throughput techniques. We introduce a structure-based framework, Coev2Net, for computing a single confidence score that addresses both false positive and false negative rates. Coev2Net is easily applied to thousands of binary protein interactions and has superior predictive performance over existing methods. We experimentally validate selected high-confidence predictions in the human MAPK network and show that predicted interfaces are enriched for cancer-related or damaging SNPs. Coev2Net can be downloaded at http://struct2net.csail.mit.edu/ 相似文献

13.

TreeDT: tree pattern mining for gene mapping

Sevon P Toivonen H Ollikainen V 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(2):174-185

相似文献

14.

A phylogenetic framework for wing pattern evolution in the mimetic Mocker Swallowtail Papilio dardanus

REBECCA CLARK ALFRIED P. VOGLER 《Molecular ecology》2009,18(18):3872-3884

相似文献

15.

RosettaRemodel: a generalized framework for flexible backbone protein design

Huang PS Ban YE Richter F Andre I Vernon R Schief WR Baker D 《PloS one》2011,6(8):e24109

We describe RosettaRemodel, a generalized framework for flexible protein design that provides a versatile and convenient interface to the Rosetta modeling suite. RosettaRemodel employs a unified interface, called a blueprint, which allows detailed control over many aspects of flexible backbone protein design calculations. RosettaRemodel allows the construction and elaboration of customized protocols for a wide range of design problems ranging from loop insertion and deletion, disulfide engineering, domain assembly, loop remodeling, motif grafting, symmetrical units, to de novo structure modeling. 相似文献

16.

A generalized adaptive dynamics framework can describe the evolutionary Ultimatum Game

Page KM Nowak MA 《Journal of theoretical biology》2001,209(2):173-179

Adaptive dynamics describes the evolution of games where the strategies are continuous functions of some parameters. The standard adaptive dynamics framework assumes that the population is homogeneous at any one time. Differential equations point to the direction of the mutant that has maximum payoff against the resident population. The population then moves towards this mutant. The standard adaptive dynamics formulation cannot deal with games in which the payoff is not differentiable. Here we present a generalized framework which can. We assume that the population is not homogeneous but distributed around an average strategy. This approach can describe the long-term dynamics of the Ultimatum Game and also explain the evolution of fairness in a one-parameter Ultimatum Game. 相似文献

17.

A Java API for working with PubChem datasets

Southern MR Griffin PR 《Bioinformatics (Oxford, England)》2011,27(5):741-742

相似文献

18.

Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications

Yiyan?Zhang Yi?Xin Qin?Li Email author Jianshe?Ma Shuai?Li Xiaodan?Lv Weiqi?Lv 《Biomedical engineering online》2017,16(1):125

Background

Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly.

Methods

In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets.

Results

The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task.

Conclusions

No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.

相似文献

19.

A robust subspace algorithm for principal component analysis

Weingessel A Hornik K 《International journal of neural systems》2003,13(5):307-313

We present a noise robust PCA algorithm which is an extension of the Oja subspace algorithm and allows tuning the noise sensitivity. We derive a loss function which is minimized by this algorithm and interpret it in a noisy PCA setting. Results on the local stability analysis of this algorithm are given and it is shown that the locally stable equilibria are those which minimize the loss function. 相似文献

20.

A biclustering algorithm for extracting bit-patterns from binary datasets

Rodriguez-Baena DS Perez-Pulido AJ Aguilar-Ruiz JS 《Bioinformatics (Oxford, England)》2011,27(19):2738-2745

相似文献