首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Microarrays have become a standard tool for investigating gene function and more complex microarray experiments are increasingly being conducted. For example, an experiment may involve samples from several groups or may investigate changes in gene expression over time for several subjects, leading to large three-way data sets. In response to this increase in data complexity, we propose some extensions to the plaid model, a biclustering method developed for the analysis of gene expression data. This model-based method lends itself to the incorporation of any additional structure such as external grouping or repeated measures. We describe how the extended models may be fitted and illustrate their use on real data.  相似文献   

Empirical Bayes Gibbs sampling   总被引:3,自引:0,他引:3  
The wide applicability of Gibbs sampling has increased the use of more complex and multi-level hierarchical models. To use these models entails dealing with hyperparameters in the deeper levels of a hierarchy. There are three typical methods for dealing with these hyperparameters: specify them, estimate them, or use a 'flat' prior. Each of these strategies has its own associated problems. In this paper, using an empirical Bayes approach, we show how the hyperparameters can be estimated in a way that is both computationally feasible and statistically valid.  相似文献   

Capture--recapture estimation via Gibbs sampling   总被引:6,自引:0,他引:6  
GEORGE  EDWARD I. 《Biometrika》1992,79(4):677-683



The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states.  相似文献   

Protein backbones have characteristic secondary structures, including α-helices and β-sheets. Which structure is adopted locally is strongly biased by the local amino acid sequence of the protein. Accurate (probabilistic) mappings from sequence to structure are valuable for both secondary-structure prediction and protein design. For the case of α-helix caps, we test whether the information content of the sequence–structure mapping can be self-consistently improved by using a relaxed definition of the structure. We derive helix-cap sequence motifs using database helix assignments for proteins of known structure. These motifs are refined using Gibbs sampling in competition with a null motif. Then Gibbs sampling is repeated, allowing for frameshifts of ±1 amino acid residue, in order to find sequence motifs of higher total information content. All helix-cap motifs were found to have good generalization capability, as judged by training on a small set of non-redundant proteins and testing on a larger set. For overall prediction purposes, frameshift motifs using all training examples yielded the best results. Frameshift motifs using a fraction of all training examples performed best in terms of true positives among top predictions. However, motifs without frameshifts also performed well, despite a roughly one-third lower total information content.  相似文献   

The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then, numerous variants of the original idea have emerged: however, in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper, we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8 Mb of sequence in each organism. We show that our approach compares favorably with existing ab initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as four organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods.  相似文献   

On Gibbs sampling for state space models   总被引:26,自引:0,他引:26  
CARTER  C. K.; KOHN  R. 《Biometrika》1994,81(3):541-553
We show how to use the Gibbs sampler to carry out Bayesian inferenceon a linear state space model with errors that are a mixtureof normals and coefficients that can switch over time. Our approachsimultaneously generates the whole of the state vector giventhe mixture and coefficient indicator variables and simultaneouslygenerates all the indicator variables conditional on the statevectors. The states are generated efficiently using the Kalmanfilter. We illustrate our approach by several examples and empiricallycompare its performance to another Gibbs sampler where the statesare generated one at a time. The empirical results suggest thatour approach is both practical to implement and dominates theGibbs sampler that generates the states one at a time.  相似文献   

Biclustering algorithms for biological data analysis: a survey   总被引:7,自引:0,他引:7  
A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix has been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this paper, we refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.  相似文献   

We propose a new Markov Chain Monte Carlo (MCMC) sampling mechanism for Bayesian phylogenetic inference. This method, which we call conjugate Gibbs, relies on analytical conjugacy properties, and is based on an alternation between data augmentation and Gibbs sampling. The data augmentation step consists in sampling a detailed substitution history for each site, and across the whole tree, given the current value of the model parameters. Provided convenient priors are used, the parameters of the model can then be directly updated by a Gibbs sampling procedure, conditional on the current substitution history. Alternating between these two sampling steps yields a MCMC device whose equilibrium distribution is the posterior probability density of interest. We show, on real examples, that this conjugate Gibbs method leads to a significant improvement of the mixing behavior of the MCMC. In all cases, the decorrelation times of the resulting chains are smaller than those obtained by standard Metropolis Hastings procedures by at least one order of magnitude. The method is particularly well suited to heterogeneous models, i.e. assuming site-specific random variables. In particular, the conjugate Gibbs formalism allows one to propose efficient implementations of complex models, for instance assuming site-specific substitution processes, that would not be accessible to standard MCMC methods.  相似文献   

Zeng W  Ghosh S  Li B 《Genetical research》2004,83(2):143-154
Diallel mating is a frequently used design for estimating the additive and dominance genetic (polygenic) effects involved in quantitative traits observed in the half- and full-sib progenies generated in plant breeding programmes. Gibbs sampling has been used for making statistical inferences for a mixed-inheritance model (MIM) that includes both major genes and polygenes. However, using this approach it has not been possible to incorporate the genetic properties of major genes with the additive and dominance polygenic effects in a diallel mating population. A parent block Gibbs sampling method was developed in this study to make statistical inferences about the major gene and polygenic effects on quantitative traits for progenies derived from a half-diallel mating design. Using simulated data sets with different major and polygenic effects, the proposed method accurately estimated the major and polygenic effects of quantitative traits, and possible genotypes of parents and progenies. The impact of specifying different prior distributions was examined and was found to have little effect on inference on the posterior distribution. This approach was applied to an experimental data set of Loblolly pine (Pinus taeda L.) derived from a 6-parent half-diallel mating. The result indicated that there might be a recessive major gene affecting height growth in this diallel population.  相似文献   

A Gibbs sampling approach to linkage analysis.   总被引:9,自引:0,他引:9  
We present a Monte Carlo approach to estimation of the recombination fraction theta and the profile likelihood for a dichotomous trait and a single marker gene with 2 alleles. The method is an application of a technique known as 'Gibbs sampling', in which random samples of each of the unknowns (here genotypes, theta and nuisance parameters, including the allele frequencies and the penetrances) are drawn from their posterior distributions, given the data and the current values of all the other unknowns. Upon convergence, the resulting samples derive from the marginal distribution of all the unknowns, given only the data, so that the uncertainty in the specification of the nuisance parameters is reflected in the variance of the posterior distribution of theta. Prior knowledge about the distribution of theta and the nuisance parameters can be incorporated using a Bayesian approach, but adoption of a flat prior for theta and point priors for the nuisance parameters would correspond to the standard likelihood approach. The method is easy to program, runs quickly on a microcomputer, and could be generalized to multiple alleles, multipoint linkage, continuous phenotypes and more complex models of disease etiology. The basic approach is illustrated by application to data on cholesterol levels and an a low-density lipoprotein receptor gene in a single large pedigree.  相似文献   

Chaussabel D  Sher A 《Genome biology》2002,3(10):research0055.1-research005516


The rapidly expanding fields of genomics and proteomics have prompted the development of computational methods for managing, analyzing and visualizing expression data derived from microarray screening. Nevertheless, the lack of efficient techniques for assessing the biological implications of gene-expression data remains an important obstacle in exploiting this information.  相似文献   

Background: Developing appropriate computational tools to distill biological insights from large-scale gene expression data has been an important part of systems biology. Considering that gene relationships may change or only exist in a subset of collected samples, biclustering that involves clustering both genes and samples has become in-creasingly important, especially when the samples are pooled from a wide range of experimental conditions. Methods: In this paper, we introduce a new biclustering algorithm to find subsets of genomic expression features (EFs) (e.g., genes, isoforms, exon inclusion) that show strong “group interactions” under certain subsets of samples. Group interactions are defined by strong partial correlations, or equivalently, conditional dependencies between EFs after removing the influences of a set of other functionally related EFs. Our new biclustering method, named SCCA-BC, extends an existing method for group interaction inference, which is based on sparse canonical correlation analysis (SCCA) coupled with repeated random partitioning of the gene expression data set. Results: SCCA-BC gives sensible results on real data sets and outperforms most existing methods in simulations. Software is available at https://github.com/pimentel/scca-bc. Conclusions: SCCA-BC seems to work in numerous conditions and the results seem promising for future extensions. SCCA-BC has the ability to find different types of bicluster patterns, and it is especially advantageous in identifying a bicluster whose elements share the same progressive and multivariate normal distribution with a dense covariance matrix.  相似文献   

Yu  Yun  Jermaine  Christopher  Nakhleh  Luay 《BMC genomics》2016,17(10):784-124


Phylogenetic networks are leaf-labeled graphs used to model and display complex evolutionary relationships that do not fit a single tree. There are two classes of phylogenetic networks: Data-display networks and evolutionary networks. While data-display networks are very commonly used to explore data, they are not amenable to incorporating probabilistic models of gene and genome evolution. Evolutionary networks, on the other hand, can accommodate such probabilistic models, but they are not commonly used for exploration.


In this work, we show how to turn evolutionary networks into a tool for statistical exploration of phylogenetic hypotheses via a novel application of Gibbs sampling. We demonstrate the utility of our work on two recently available genomic data sets, one from a group of mosquitos and the other from a group of modern birds. We demonstrate that our method allows the use of evolutionary networks not only for explicit modeling of reticulate evolutionary histories, but also for exploring conflicting treelike hypotheses. We further demonstrate the performance of the method on simulated data sets, where the true evolutionary histories are known.


We introduce an approach to explore phylogenetic hypotheses over evolutionary phylogenetic networks using Gibbs sampling. The hypotheses could involve reticulate and non-reticulate evolutionary processes simultaneously as we illustrate on mosquito and modern bird genomic data sets.

Wigle DA  Rossant J  Jurisica I 《Genome biology》2001,2(7):reviews1019.1-reviews10194
Microarrays of mouse genes are now available from several sources, and they have so far given new insights into gene expression in embryonic development, regions of the brain and during apoptosis. Microarray data posted on the internet can be reanalyzed to study a range of questions.  相似文献   

The paper presents a method of multivariate data analysis described by a model which involves fixed effects, additive polygenic individual effects and the effects of a major gene. To find the estimates of model parameters, the maximization of likelihood function method is applied. The maximum of likelihood function is computed by the use of the Gibbs sampling approach. In this approach, following the conditional posterior distributions, values of all unknown parameters are generated. On the basis of the obtained samples the marginal posterior densities as well as the estimates of fixed effects, gene frequency, genotypic values, major gene, polygenic and error (co)variances are calculated. A numerical example, supplemented to theoretical considerations, deals with data simulated according to the considered model.  相似文献   



Pre-processing methods for two-sample long oligonucleotide arrays, specifically the Agilent technology, have not been extensively studied. The goal of this study is to quantify some of the sources of error that affect measurement of expression using Agilent arrays and to compare Agilent's Feature Extraction software with pre-processing methods that have become the standard for normalization of cDNA arrays. These include log transformation followed by loess normalization with or without background subtraction and often a between array scale normalization procedure. The larger goal is to define best study design and pre-processing practices for Agilent arrays, and we offer some suggestions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号