首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Background  

Cluster analysis has become a standard computational method for gene function discovery as well as for more general explanatory data analysis. A number of different approaches have been proposed for that purpose, out of which different mixture models provide a principled probabilistic framework. Cluster analysis is increasingly often supplemented with multiple data sources nowadays, and these heterogeneous information sources should be made as efficient use of as possible.  相似文献   

2.
3.
4.
Zhou C  Wakefield J 《Biometrics》2006,62(2):515-525
In recent years there has been great interest in making inference for gene expression data collected over time. In this article, we describe a Bayesian hierarchical mixture model for partitioning such data. While conventional approaches cluster the observed data, we assume a nonparametric, random walk model, and partition on the basis of the parameters of this model. The model is flexible and can be tuned to the specific context, respects the order of observations within each curve, acknowledges measurement error, and allows prior knowledge on parameters to be incorporated. The number of partitions may also be treated as unknown, and inferred from the data, in which case computation is carried out via a birth-death Markov chain Monte Carlo algorithm. We first examine the behavior of the model on simulated data, along with a comparison with more conventional approaches, and then analyze meiotic expression data collected over time on fission yeast genes.  相似文献   

5.
A latent-class mixture model for incomplete longitudinal Gaussian data   总被引:2,自引:1,他引:1  
Summary .   In the analyses of incomplete longitudinal clinical trial data, there has been a shift, away from simple methods that are valid only if the data are missing completely at random, to more principled ignorable analyses, which are valid under the less restrictive missing at random assumption. The availability of the necessary standard statistical software nowadays allows for such analyses in practice. While the possibility of data missing not at random (MNAR) cannot be ruled out, it is argued that analyses valid under MNAR are not well suited for the primary analysis in clinical trials. Rather than either forgetting about or blindly shifting to an MNAR framework, the optimal place for MNAR analyses is within a sensitivity-analysis context. One such route for sensitivity analysis is to consider, next to selection models, pattern-mixture models or shared-parameter models. The latter can also be extended to a latent-class mixture model, the approach taken in this article. The performance of the so-obtained flexible model is assessed through simulations and the model is applied to data from a depression trial.  相似文献   

6.
Naskar M  Das K  Ibrahim JG 《Biometrics》2005,61(3):729-737
A very general class of multivariate life distributions is considered for analyzing failure time clustered data that are subject to censoring and multiple modes of failure. Conditional on cluster-specific quantities, the joint distribution of the failure time and event indicator can be expressed as a mixture of the distribution of time to failure due to a certain type (or specific cause), and the failure type distribution. We assume here the marginal probabilities of various failure types are logistic functions of some covariates. The cluster-specific quantities are subject to some unknown distribution that causes frailty. The unknown frailty distribution is modeled nonparametrically using a Dirichlet process. In such a semiparametric setup, a hybrid method of estimation is proposed based on the i.i.d. Weighted Chinese Restaurant algorithm that helps us generate observations from the predictive distribution of the frailty. The Monte Carlo ECM algorithm plays a vital role for obtaining the estimates of the parameters that assess the extent of the effects of the causal factors for failures of a certain type. A simulation study is conducted to study the consistency of our methodology. The proposed methodology is used to analyze a real data set on HIV infection of a cohort of female prostitutes in Senegal.  相似文献   

7.
Uebersax JS  Grove WM 《Biometrics》1993,49(3):823-835
This article presents a latent distribution model for the analysis of agreement on dichotomous or ordered category ratings. The model includes parameters that characterize bias, category definitions, and measurement error for each rater or test. Parameter estimates can be used to evaluate rater performance and to improve classification or measurement with use of multiple ratings. A simple maximum likelihood estimation procedure is described. Two examples illustrate the approach. Although considered in the context of analyzing rater agreement, the model provides a general approach for mixture analysis using two or more ordered-caregory measures.  相似文献   

8.
9.
Recent developments in mass-spectrometry-based shotgun proteomics, especially methods using spectral counting, have enabled large-scale identification and differential profiling of complex proteomes. Most such proteomic studies are interested in identifying proteins, the abundance of which is different under various conditions. Several quantitative methods have recently been proposed and implemented for this purpose. Building on some techniques that are now widely accepted in the microarray literature, we developed and implemented a new method using a Bayesian model to calculate posterior probabilities of differential abundance for thousands of proteins in a given experiment simultaneously. Our Bayesian model is shown to deliver uniformly superior performance when compared with several existing methods.  相似文献   

10.
Pauler DK  Laird NM 《Biometrics》2000,56(2):464-472
In clinical trials of a self-administered drug, repeated measures of a laboratory marker, which is affected by study medication and collected in all treatment arms, can provide valuable information on population and individual summaries of compliance. In this paper, we introduce a general finite mixture of nonlinear hierarchical models that allows estimates of component membership probabilities and random effect distributions for longitudinal data arising from multiple subpopulations, such as from noncomplying and complying subgroups in clinical trials. We outline a sampling strategy for fitting these models, which consists of a sequence of Gibbs, Metropolis-Hastings, and reversible jump steps, where the latter is required for switching between component models of different dimensions. Our model is applied to identify noncomplying subjects in the placebo arm of a clinical trial assessing the effectiveness of zidovudine (AZT) in the treatment of patients with HIV, where noncompliance was defined as initiation of AZT during the trial without the investigators' knowledge. We fit a hierarchical nonlinear change-point model for increases in the marker MCV (mean corpuscular volume of erythrocytes) for subjects who noncomply and a constant mean random effects model for those who comply. As part of our fully Bayesian analysis, we assess the sensitivity of conclusions to prior and modeling assumptions and demonstrate how external information and covariates can be incorporated to distinguish subgroups.  相似文献   

11.
S Magnussen 《Génome》1992,35(6):931-938
A regression model to predict quantiles of narrow sense individual and family mean heritabilities is developed and used to predict confidence intervals either directly or via a generalized beta distribution model. Extensive simulations of balanced sib analysis trials in randomized complete block designs and normal distributed environmental and additive genetic effects confirmed that heritabilities follow a beta distribution even in cases with up to 10% of the data missing at random. The new model is both more accurate and more precise than commonly used alternatives based on "exact" chi 2 distributions and Satterthwaites approximations to the degrees of freedom. Estimates of the expected heritability and a Taylor approximation of the standard error of the heritability are needed as input to the quantile model. Applications of the presented models for estimating confidence intervals and as an aid in the design of experiments are provided.  相似文献   

12.
Protein identification is a key and essential step in mass spectrometry (MS) based proteome research. To date, there are many protein identification strategies that employ either MS data or MS/MS data for database searching. While MS-based methods provide wider coverage than MS/MS-based methods, their identification accuracy is lower since MS data have less information than MS/MS data. Thus, it is desired to design more sophisticated algorithms that achieve higher identification accuracy using MS data. Peptide Mass Fingerprinting (PMF) has been widely used to identify single purified proteins from MS data for many years. In this paper, we extend this technology to protein mixture identification. First, we formulate the problem of protein mixture identification as a Partial Set Covering (PSC) problem. Then, we present several algorithms that can solve the PSC problem efficiently. Finally, we extend the partial set covering model to both MS/MS data and the combination of MS data and MS/MS data. The experimental results on simulated data and real data demonstrate the advantages of our method: 1) it outperforms previous MS-based approaches significantly; 2) it is useful in the MS/MS-based protein inference; and 3) it combines MS data and MS/MS data in a unified model such that the identification performance is further improved.  相似文献   

13.
We describe a general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites "pattern-heterogeneity" to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of pattern-heterogeneity from simulated gene-sequence data, and we apply the method to protein-coding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate- and pattern-heterogeneity. The model simplifies to a homogeneous model or a rate-variability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markov-chain Monte Carlo framework for phylogenetic inference, as an easy-to-use computer program.  相似文献   

14.
This paper presents a study of the diagnosis of "dyspepsia" in 154 patients based on data collected at their initial outpatient attendance via an interview with a non-medically qualified physician''s assistant. The reactions of patients to this type of interview were favourable, and the data recorded were as reliable as those recorded by clinicians. We conclude (1) that the data recorded by the physician''s assistant are valuable diagnostically; (2) where these cannot be collected by a qualified physician, this task may be delegated to a non-medically qualified person; but (3) this interview should augment and not replace the traditional clinical interview.  相似文献   

15.
Finite element (FE) modelling based on data from three-dimensional high-resolution computed tomography (CT) imaging systems provides a non-invasive method to assess structural mechanics. Automated mesh generation from these voxel based image data can be achieved by direct conversion to hexahedron elements, however these model representations have jagged edges. This paper proposes an automated method to generate smoothed FE meshes from voxel-based image data. Mesh fairing processes are utilized that allow constraints that control the smoothing process, and are computationally efficient. Surfaces of the mesh on the exterior, as well as interfaces between two tissues, can be smoothed by varying fairing parameters and constraint criteria. The method was tested on a variety of real and simulated three-dimensional data sets, resulting in both hexahedron and tetrahedron meshes. It was shown that the fairing process is linearly related to the number of smoothing iterations, and that peak stresses are reduced in FE simulations of the smoothed models. Although developed for micro-CT data sets, this fast and reliable mesh smoothing method could be applied to any three-dimensional image data where node and element connectivity have been defined.  相似文献   

16.
Kauermann G  Eilers P 《Biometrics》2004,60(2):376-387
An important goal of microarray studies is the detection of genes that show significant changes in expression when two classes of biological samples are being compared. We present an ANOVA-style mixed model with parameters for array normalization, overall level of gene expression, and change of expression between the classes. For the latter we assume a mixing distribution with a probability mass concentrated at zero, representing genes with no changes, and a normal distribution representing the level of change for the other genes. We estimate the parameters by optimizing the marginal likelihood. To make this practical, Laplace approximations and a backfitting algorithm are used. The performance of the model is studied by simulation and by application to publicly available data sets.  相似文献   

17.
18.

Background  

Allelic-loss studies record data on the loss of genetic material in tumor tissue relative to normal tissue at various loci along the genome. As the deletion of a tumor suppressor gene can lead to tumor development, one objective of these studies is to determine which, if any, chromosome arms harbor tumor suppressor genes.  相似文献   

19.
An exciting biological advancement over the past few years is the use of microarray technologies to measure simultaneously the expression levels of thousands of genes. The bottleneck now is how to extract useful information from the resulting large amounts of data. An important and common task in analyzing microarray data is to identify genes with altered expression under two experimental conditions. We propose a nonparametric statistical approach, called the mixture model method (MMM), to handle the problem when there are a small number of replicates under each experimental condition. Specifically, we propose estimating the distributions of a t -type test statistic and its null statistic using finite normal mixture models. A comparison of these two distributions by means of a likelihood ratio test, or simply using the tail distribution of the null statistic, can identify genes with significantly changed expression. Several methods are proposed to effectively control the false positives. The methodology is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle ear infection.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号