期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using hidden Markov models to analyze gene expression time course data

Schliep A Schönhuth A Steinhoff C 《Bioinformatics (Oxford, England)》2003,19(Z1):i255-i263

MOTIVATION: Cellular processes cause changes over time. Observing and measuring those changes over time allows insights into the how and why of regulation. The experimental platform for doing the appropriate large-scale experiments to obtain time-courses of expression levels is provided by microarray technology. However, the proper way of analyzing the resulting time course data is still very much an issue under investigation. The inherent time dependencies in the data suggest that clustering techniques which reflect those dependencies yield improved performance. RESULTS: We propose to use Hidden Markov Models (HMMs) to account for the horizontal dependencies along the time axis in time course data and to cope with the prevalent errors and missing values. The HMMs are used within a model-based clustering framework. We are given a number of clusters, each represented by one Hidden Markov Model from a finite collection encompassing typical qualitative behavior. Then, our method finds in an iterative procedure cluster models and an assignment of data points to these models that maximizes the joint likelihood of clustering and models. Partially supervised learning--adding groups of labeled data to the initial collection of clusters--is supported. A graphical user interface allows querying an expression profile dataset for time course similar to a prototype graphically defined as a sequence of levels and durations. We also propose a heuristic approach to automate determination of the number of clusters. We evaluate the method on published yeast cell cycle and fibroblasts serum response datasets, and compare them, with favorable results, to the autoregressive curves method. 相似文献

2.

A data-driven clustering method for time course gene expression data 总被引：1，自引：0，他引：1

Ma P Castillo-Davis CI Zhong W Liu JS 《Nucleic acids research》2006,34(4):1261-1269

Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a 'mean curve' construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html). 相似文献

3.

Group SCAD regression analysis for microarray time course gene expression data 总被引：1，自引：0，他引：1

Wang L Chen G Li H 《Bioinformatics (Oxford, England)》2007,23(12):1486-1494

相似文献

4.

MixMir: microRNA motif discovery from gene expression data using mixed linear models

Liyang Diao Antoine Marcais Scott Norton Kevin C. Chen 《Nucleic acids research》2014,42(17):e135

相似文献

5.

Biologically valid linear factor models of gene expression

Girolami M Breitling R 《Bioinformatics (Oxford, England)》2004,20(17):3021-3033

MOTIVATION: The identification of physiological processes underlying and generating the expression pattern observed in microarray experiments is a major challenge. Principal component analysis (PCA) is a linear multivariate statistical method that is regularly employed for that purpose as it provides a reduced-dimensional representation for subsequent study of possible biological processes responding to the particular experimental conditions. Making explicit the data assumptions underlying PCA highlights their lack of biological validity thus making biological interpretation of the principal components problematic. A microarray data representation which enables clear biological interpretation is a desirable analysis tool. RESULTS: We address this issue by employing the probabilistic interpretation of PCA and proposing alternative linear factor models which are based on refined biological assumptions. A practical study on two well-understood microarray datasets highlights the weakness of PCA and the greater biological interpretability of the linear models we have developed. 相似文献

6.

h-Profile plots for the discovery and exploration of patterns in gene expression data with an application to time course data

Yvonne E Pittelkow Susan R Wilson 《BMC bioinformatics》2007,8(1):486

Background

An ever increasing number of techniques are being used to find genes with similar profiles from microarray studies. Visualization of gene expression profiles can aid this process, potentially contributing to the identification of co-regulated genes and gene function as well as network development. 相似文献

7.

Statistical inference of transcriptional module-based gene networks from time course gene expression profiles by using state space models 总被引：2，自引：0，他引：2

Hirose O Yoshida R Imoto S Yamaguchi R Higuchi T Charnock-Jones DS Print C Miyano S 《Bioinformatics (Oxford, England)》2008,24(7):932-942

相似文献

8.

Cluster-Rasch models for microarray gene expression data 总被引：1，自引：0，他引：1

Li H Hong F 《Genome biology》2001,2(8):research0031.1-research003113

Background

We propose two different formulations of the Rasch statistical models to the problem of relating gene expression profiles to the phenotypes. One formulation allows us to investigate whether a cluster of genes with similar expression profiles is related to the observed phenotypes; this model can also be used for future prediction. The other formulation provides an alternative way of identifying genes that are over- or underexpressed from their expression levels in tissue or cell samples of a given tissue or cell type.

Results

We illustrate the methods on available datasets of a classification of acute leukemias and of 60 cancer cell lines. For tumor classification, the results are comparable to those previously obtained. For the cancer cell lines dataset, we found four clusters of genes that are related to drug response for many of the 90 drugs that we considered. In addition, for each type of cell line, we identified genes that are over- or underexpressed relative to other genes.

Conclusions

The cluster-Rasch model provides a probabilistic model for describing gene expression patterns across samples and can be used to relate gene expression profiles to phenotypes. 相似文献

9.

Clustering short time series gene expression data

Ernst J Nau GJ Bar-Joseph Z 《Bioinformatics (Oxford, England)》2005,21(Z1):i159-i168

相似文献

10.

Model-based methods for identifying periodically expressed genes based on time course microarray gene expression data 总被引：4，自引：0，他引：4

Luan Y Li H 《Bioinformatics (Oxford, England)》2004,20(3):332-339

相似文献

11.

Failure time models with matched data

WILD C. J. 《Biometrika》1983,70(3):633-641

相似文献

12.

Estimation in linear models with censored data 总被引：1，自引：0，他引：1

SCHNEIDER HELMUT; WEISSFELD LISA 《Biometrika》1986,73(3):741-745

相似文献

13.

Integration of gene expression data into genome-scale metabolic models 总被引：4，自引：0，他引：4

Akesson M Förster J Nielsen J 《Metabolic engineering》2004,6(4):206-293

相似文献

14.

Group additive regression models for genomic data analysis

Luan Y Li H 《Biostatistics (Oxford, England)》2008,9(1):100-113

One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group features that are related to the phenotypes. In addition, the prediction mean square errors are also smaller than the component-wise boosting procedure. We demonstrate the application of the methods to pathway-based analysis of microarray gene expression data of breast cancer. Results from analysis of a breast cancer microarray gene expression data set indicate that the pathways of metalloendopeptidases (MMPs) and MMP inhibitors, as well as cell proliferation, cell growth, and maintenance are important to breast cancer-specific survival. 相似文献

15.

Discovering gene expression patterns in time course microarray experiments by ANOVA-SCA 总被引：2，自引：0，他引：2

Nueda MJ Conesa A Westerhuis JA Hoefsloot HC Smilde AK Talón M Ferrer A 《Bioinformatics (Oxford, England)》2007,23(14):1792-1800

相似文献

16.

Quantile regression models with multivariate failure time data

Yin G Cai J 《Biometrics》2005,61(1):151-161

As an alternative to the mean regression model, the quantile regression model has been studied extensively with independent failure time data. However, due to natural or artificial clustering, it is common to encounter multivariate failure time data in biomedical research where the intracluster correlation needs to be accounted for appropriately. For right-censored correlated survival data, we investigate the quantile regression model and adapt an estimating equation approach for parameter estimation under the working independence assumption, as well as a weighted version for enhancing the efficiency. We show that the parameter estimates are consistent and asymptotically follow normal distributions. The variance estimation using asymptotic approximation involves nonparametric functional density estimation. We employ the bootstrap and perturbation resampling methods for the estimation of the variance-covariance matrix. We examine the proposed method for finite sample sizes through simulation studies, and illustrate it with data from a clinical trial on otitis media. 相似文献

17.

Improving the accuracy of expression data analysis in time course experiments using resampling

Wencke Walter Bernd Striberny Emmanuel Gaquerel Ian T Baldwin Sang-Gyu Kim Ines Heiland 《BMC bioinformatics》2014,15(1)

相似文献

18.

HMMGEP: clustering gene expression data using hidden Markov models 总被引：3，自引：0，他引：3

Ji X Yuan Y Li-Ling J Li Y Sun Z 《Bioinformatics (Oxford, England)》2004,20(11):1799-1800

SUMMARY: The package HMMGEP performs cluster analysis on gene expression data using hidden Markov models. AVAILABILITY: HMMGEP, including the source code, documentation and sample data files, is available at http://www.bioinfo.tsinghua.edu.cn:8080/~rich/hmmgep_download/index.html. 相似文献

19.

Discovering biclusters in gene expression data based on high-dimensional linear geometries

Xiangchao Gan Alan Wee-Chung Liew Hong Yan 《BMC bioinformatics》2008,9(1):209

相似文献

20.

Learning rule-based models of biological process from gene expression time profiles using gene ontology 总被引：2，自引：0，他引：2

Hvidsten TR Laegreid A Komorowski J 《Bioinformatics (Oxford, England)》2003,19(9):1116-1123

MOTIVATION: Microarray technology enables large-scale inference of the participation of genes in biological process from similar expression profiles. Our aim is to induce classificatory models from expression data and biological knowledge that can automatically associate genes with novel hypotheses of biological process. RESULTS: We report a systematic supervised learning approach to predicting biological process from time series of gene expression data and biological knowledge. Biological knowledge is expressed using gene ontology and this knowledge is associated with discriminatory expression-based features to form minimal decision rules. The resulting rule model is first evaluated on genes coding for proteins with known biological process roles using cross validation. Then it is used to generate hypotheses for genes for which no knowledge of participation in biological process could be found. The theoretical foundation for the methodology based on rough sets is outlined in the paper, and its practical application demonstrated on a data set previously published by Cho et al. (Nat. Genet., 27, 48-54, 2001). AVAILABILITY: The Rosetta system is available at http://www.idi.ntnu.no/~aleks/rosetta. SUPPLEMENTARY INFORMATION: http://www.lcb.uu.se/~hvidsten/bioinf_cho/ 相似文献