首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
Large-scale microarray gene expression studies can provide insight into complex genetic networks and biological pathways. A comprehensive gene expression database was constructed using Affymetrix GeneChip microarrays and RNA isolated from more than 6,400 distinct normal and diseased human tissues. These individual patient samples were grouped into over 700 sample sets based on common tissue and disease morphologies, and each set contained averaged expression data for over 45,000 gene probe sets representing more than 33,000 known human genes. Sample sets were compared to each other in more than 750 normal vs. disease pairwise comparisons. Relative up or down-regulation patterns of genes across these pairwise comparisons provided unique expression fingerprints that could be compared and matched to a gene of interest using the Match/X algorithm. This algorithm uses the kappa statistic to compute correlations between genes and calculate a distance score between a gene of interest and all other genes in the database. Using cdc2 as a query gene, we identified several hundred genes that had similar expression patterns and highly correlated distance scores. Most of these genes were known components of the cell cycle involved in G2/M progression, spindle function or chromosome arrangement. Some of the identified genes had unknown biological functions but may be related to cdc2 mediated mechanism based on their closely correlated distance scores. This algorithm may provide novel insights into unknown gene function based on correlation to expression profiles of known genes and can identify elements of cellular pathways and gene interactions in a high throughput fashion.  相似文献   

3.
Gene expression profiles of clinical cohorts can be used to identify genes that are correlated with a clinical variable of interest such as patient outcome or response to a particular drug. However, expression measurements are susceptible to technical bias caused by variation in extraneous factors such as RNA quality and array hybridization conditions. If such technical bias is correlated with the clinical variable of interest, the likelihood of identifying false positive genes is increased. Here we describe a method to visualize an expression matrix as a projection of all genes onto a plane defined by a clinical variable and a technical nuisance variable. The resulting plot indicates the extent to which each gene is correlated with the clinical variable or the technical variable. We demonstrate this method by applying it to three clinical trial microarray data sets, one of which identified genes that may have been driven by a confounding technical variable. This approach can be used as a quality control step to identify data sets that are likely to yield false positive results.  相似文献   

4.
Regulatory motif finding by logic regression   总被引:1,自引:0,他引:1  
  相似文献   

5.
Selection on phenotypes may cause genetic change. To understand the relationship between phenotype and gene expression from an evolutionary viewpoint, it is important to study the concordance between gene expression and profiles of phenotypes. In this study, we use a novel method of clustering to identify genes whose expression profiles are related to a quantitative phenotype. Cluster analysis of gene expression data aims at classifying genes into several different groups based on the similarity of their expression profiles across multiple conditions. The hope is that genes that are classified into the same clusters may share underlying regulatory elements or may be a part of the same metabolic pathways. Current methods for examining the association between phenotype and gene expression are limited to linear association measured by the correlation between individual gene expression values and phenotype. Genes may be associated with the phenotype in a nonlinear fashion. In addition, groups of genes that share a particular pattern in their relationship to phenotype may be of evolutionary interest. In this study, we develop a method to group genes based on orthogonal polynomials under a multivariate Gaussian mixture model. The effect of each expressed gene on the phenotype is partitioned into a cluster mean and a random deviation from the mean. Genes can also be clustered based on a time series. Parameters are estimated using the expectation-maximization algorithm and implemented in SAS. The method is verified with simulated data and demonstrated with experimental data from 2 studies, one clusters with respect to severity of disease in Alzheimer's patients and another clusters data for a rat fracture healing study over time. We find significant evidence of nonlinear associations in both studies and successfully describe these patterns with our method. We give detailed instructions and provide a working program that allows others to directly implement this method in their own analyses.  相似文献   

6.
Cai B  Dunson DB 《Biometrics》2006,62(2):446-457
The generalized linear mixed model (GLMM), which extends the generalized linear model (GLM) to incorporate random effects characterizing heterogeneity among subjects, is widely used in analyzing correlated and longitudinal data. Although there is often interest in identifying the subset of predictors that have random effects, random effects selection can be challenging, particularly when outcome distributions are nonnormal. This article proposes a fully Bayesian approach to the problem of simultaneous selection of fixed and random effects in GLMMs. Integrating out the random effects induces a covariance structure on the multivariate outcome data, and an important problem that we also consider is that of covariance selection. Our approach relies on variable selection-type mixture priors for the components in a special Cholesky decomposition of the random effects covariance. A stochastic search MCMC algorithm is developed, which relies on Gibbs sampling, with Taylor series expansions used to approximate intractable integrals. Simulated data examples are presented for different exponential family distributions, and the approach is applied to discrete survival data from a time-to-pregnancy study.  相似文献   

7.
The major regions coding for the transfer RNA genes in the mitochondrial DNA of K. lactis were studied. Twenty one, out of a supposed twenty four tRNA genes were identified and localized with respect to other mitochondrial genes. Most of the tRNA genes were found in a cluster downstream of the large ribosomal RNA gene. The order of a few groups of genes is conserved with respect to S. cerevisiae and T. glabrata. The highly diverged intergenic sequences contained a large number of guanine-cytosine clusters which frequently formed long palindromic sequences.  相似文献   

8.
9.
Summary .  Multiple outcomes are often used to properly characterize an effect of interest. This article discusses model-based statistical methods for the classification of units into one of two or more groups where, for each unit, repeated measurements over time are obtained on each outcome. We relate the observed outcomes using multivariate nonlinear mixed-effects models to describe evolutions in different groups. Due to its flexibility, the random-effects approach for the joint modeling of multiple outcomes can be used to estimate population parameters for a discriminant model that classifies units into distinct predefined groups or populations. Parameter estimation is done via the expectation-maximization algorithm with a linear approximation step. We conduct a simulation study that sheds light on the effect that the linear approximation has on classification results. We present an example using data from a study in 161 pregnant women in Santiago, Chile, where the main interest is to predict normal versus abnormal pregnancy outcomes.  相似文献   

10.
11.
Zhou C  Wakefield J 《Biometrics》2006,62(2):515-525
In recent years there has been great interest in making inference for gene expression data collected over time. In this article, we describe a Bayesian hierarchical mixture model for partitioning such data. While conventional approaches cluster the observed data, we assume a nonparametric, random walk model, and partition on the basis of the parameters of this model. The model is flexible and can be tuned to the specific context, respects the order of observations within each curve, acknowledges measurement error, and allows prior knowledge on parameters to be incorporated. The number of partitions may also be treated as unknown, and inferred from the data, in which case computation is carried out via a birth-death Markov chain Monte Carlo algorithm. We first examine the behavior of the model on simulated data, along with a comparison with more conventional approaches, and then analyze meiotic expression data collected over time on fission yeast genes.  相似文献   

12.
13.
The importance of multispecies models for understanding complex ecological processes and interactions is beginning to be realized. Recent developments, such as those by Lahoz‐Monfort et al. (2011), have enabled synchrony in demographic parameters across multiple species to be explored. Species in a similar environment would be expected to be subject to similar exogenous factors, although their response to each of these factors may be quite different. The ability to group species together according to how they respond to a particular measured covariate may be of particular interest to ecologists. We fit a multispecies model to two sets of similar species of garden bird monitored under the British Trust for Ornithology's Garden Bird Feeding Survey. Posterior model probabilities were estimated using the reversible jump algorithm to compare posterior support for competing models with different species sharing different subsets of regression coefficients. There was frequently good agreement between species with small asynchronous random‐effect components and those with posterior support for models with shared regression coefficients; however, this was not always the case. When groups of species were less correlated, greater uncertainty was found in whether regression coefficients should be shared or not. The methods outlined in this study can test additional hypotheses about the similarities or synchrony across multiple species that share the same environment. Through the use of posterior model probabilities, estimated using the reversible jump algorithm, we can detect multispecies responses in relation to measured covariates across any combination of species and covariates under consideration. The method can account for synchrony across species in relation to measured covariates, as well as unexplained variation accounted for using random effects. For more flexible, multiparameter distributions, the support for species‐specific parameters can also be measured.  相似文献   

14.
INCLUSive allows automatic multistep analysis of microarray data (clustering and motif finding). The clustering algorithm (adaptive quality-based clustering) groups together genes with highly similar expression profiles. The upstream sequences of the genes belonging to a cluster are automatically retrieved from GenBank and can be fed directly into Motif Sampler, a Gibbs sampling algorithm that retrieves statistically over-represented motifs in sets of sequences, in this case upstream regions of co-expressed genes.  相似文献   

15.
MOTIVATION: There is a growing interest in extracting statistical patterns from gene expression time-series data, in which a key challenge is the development of stable and accurate probabilistic models. Currently popular models, however, would be computationally prohibitive unless some independence assumptions are made to describe large-scale data. We propose an unsupervised conditional random fields (CRF) model to overcome this problem by progressively infusing information into the labelling process through a small variable voting pool. RESULTS: An unsupervised CRF model is proposed for efficient analysis of gene expression time series and is successfully applied to gene class discovery and class prediction. The proposed model treats each time series as a random field and assigns an optimal cluster label to each time series, so as to partition the time series into clusters without a priori knowledge about the number of clusters and the initial centroids. Another advantage of the proposed method is the relaxation of independence assumptions.  相似文献   

16.
Recently, there has been a great deal of interest in the analysis of multivariate survival data. In most epidemiological studies, survival times of the same cluster are related because of some unobserved risk factors such as the environmental or genetic factors. Therefore, modelling of dependence between events of correlated individuals is required to ensure a correct inference on the effects of treatments or covariates on the survival times. In the past decades, extension of proportional hazards model has been widely considered for modelling multivariate survival data by incorporating a random effect which acts multiplicatively on the hazard function. In this article, we consider the proportional odds model, which is an alternative to the proportional hazards model at which the hazard ratio between individuals converges to unity eventually. This is a reasonable property particularly when the treatment effect fades out gradually and the homogeneity of the population increases over time. The objective of this paper is to assess the influence of the random effect on the within‐subject correlation and the population heterogeneity. We are particularly interested in the properties of the proportional odds model with univariate random effect and correlated random effect. The correlations between survival times are derived explicitly for both choices of mixing distributions and are shown to be independent of the covariates. The time path of the odds function among the survivors are also examined to study the effect of the choice of mixing distribution. Modelling multivariate survival data using a univariate mixing distribution may be inadequate as the random effect not only characterises the dependence of the survival times, but also the conditional heterogeneity among the survivors. A robust estimate for the correlation of the logarithm of the survival times within a cluster is obtained disregarding the choice of the mixing distributions. The sensitivity of the estimate of the regression parameter under a misspecification of the mixing distribution is studied through simulation. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

17.
Association Models for Clustered Data with Binary and Continuous Responses   总被引:1,自引:0,他引:1  
Summary .  We consider analysis of clustered data with mixed bivariate responses, i.e., where each member of the cluster has a binary and a continuous outcome. We propose a new bivariate random effects model that induces associations among the binary outcomes within a cluster, among the continuous outcomes within a cluster, between a binary outcome and a continuous outcome from different subjects within a cluster, as well as the direct association between the binary and continuous outcomes within the same subject. For the ease of interpretations of the regression effects, the marginal model of the binary response probability integrated over the random effects preserves the logistic form and the marginal expectation of the continuous response preserves the linear form. We implement maximum likelihood estimation of our model parameters using standard software such as PROC NLMIXED of SAS . Our simulation study demonstrates the robustness of our method with respect to the misspecification of the regression model as well as the random effects model. We illustrate our methodology by analyzing a developmental toxicity study of ethylene glycol in mice.  相似文献   

18.
In surveillance studies of periodontal disease, the relationship between disease and other health and socioeconomic conditions is of key interest. To determine whether a patient has periodontal disease, multiple clinical measurements (eg, clinical attachment loss, alveolar bone loss, and tooth mobility) are taken at the tooth‐level. Researchers often create a composite outcome from these measurements or analyze each outcome separately. Moreover, patients have varying number of teeth, with those who are more prone to the disease having fewer teeth compared to those with good oral health. Such dependence between the outcome of interest and cluster size (number of teeth) is called informative cluster size and results obtained from fitting conventional marginal models can be biased. We propose a novel method to jointly analyze multiple correlated binary outcomes for clustered data with informative cluster size using the class of generalized estimating equations (GEE) with cluster‐specific weights. We compare our proposed multivariate outcome cluster‐weighted GEE results to those from the convectional GEE using the baseline data from Veterans Affairs Dental Longitudinal Study. In an extensive simulation study, we show that our proposed method yields estimates with minimal relative biases and excellent coverage probabilities.  相似文献   

19.
20.
Bacterial isolates were obtained from two sites in New Zealand contaminated with polycyclic aromatic hydrocarbons. Isolates capable of degrading polycyclic aromatic hydrocarbons were characterized in two mycobacterial groups according to phenotypic properties. These groups were supported by random amplified polymorphic DNA analysis. Nucleotide sequences of 16S ribosomal RNA genes from isolates representing each group were determined and compared with other mycobacterial 16S ribosomal RNA sequences. The taxonomic relationships of these isolates are considered.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号