首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Hsieh F  Tseng YK  Wang JL 《Biometrics》2006,62(4):1037-1043
The maximum likelihood approach to jointly model the survival time and its longitudinal covariates has been successful to model both processes in longitudinal studies. Random effects in the longitudinal process are often used to model the survival times through a proportional hazards model, and this invokes an EM algorithm to search for the maximum likelihood estimates (MLEs). Several intriguing issues are examined here, including the robustness of the MLEs against departure from the normal random effects assumption, and difficulties with the profile likelihood approach to provide reliable estimates for the standard error of the MLEs. We provide insights into the robustness property and suggest to overcome the difficulty of reliable estimates for the standard errors by using bootstrap procedures. Numerical studies and data analysis illustrate our points.  相似文献   

Han F  Pan W 《Biometrics》2012,68(1):307-315
Many statistical tests have been proposed for case-control data to detect disease association with multiple single nucleotide polymorphisms (SNPs) in linkage disequilibrium. The main reason for the existence of so many tests is that each test aims to detect one or two aspects of many possible distributional differences between cases and controls, largely due to the lack of a general and yet simple model for discrete genotype data. Here we propose a latent variable model to represent SNP data: the observed SNP data are assumed to be obtained by discretizing a latent multivariate Gaussian variate. Because the latent variate is multivariate Gaussian, its distribution is completely characterized by its mean vector and covariance matrix, in contrast to much more complex forms of a general distribution for discrete multivariate SNP data. We propose a composite likelihood approach for parameter estimation. A direct application of this latent variable model is to association testing with multiple SNPs in a candidate gene or region. In contrast to many existing tests that aim to detect only one or two aspects of many possible distributional differences of discrete SNP data, we can exclusively focus on testing the mean and covariance parameters of the latent Gaussian distributions for cases and controls. Our simulation results demonstrate potential power gains of the proposed approach over some existing methods.  相似文献   

Zhang J  Yue C  Zhang YM 《Heredity》2012,108(4):396-402
A penalized maximum likelihood method has been proposed as an important approach to the detection of epistatic quantitative trait loci (QTL). However, this approach is not optimal in two special situations: (1) closely linked QTL with effects in opposite directions and (2) small-effect QTL, because the method produces downwardly biased estimates of QTL effects. The present study aims to correct the bias by using correction coefficients and shifting from the use of a uniform prior on the variance parameter of a QTL effect to that of a scaled inverse chi-square prior. The results of Monte Carlo simulation experiments show that the improved method increases the power from 25 to 88% in the detection of two closely linked QTL of equal size in opposite directions and from 60 to 80% in the identification of QTL with small effects (0.5% of the total phenotypic variance). We used the improved method to detect QTL responsible for the barley kernel weight trait using 145 doubled haploid lines developed in the North American Barley Genome Mapping Project. Application of the proposed method to other shrinkage estimation of QTL effects is discussed.  相似文献   

MOTIVATION: DNA microarrays allow the simultaneous measurement of thousands of gene expression levels in any given patient sample. Gene expression data have been shown to correlate with survival in several cancers, however, analysis of the data is difficult, since typically at most a few hundred patients are available, resulting in severely underdetermined regression or classification models. Several approaches exist to classify patients in different risk classes, however, relatively little has been done with respect to the prediction of actual survival times. We introduce CASPAR, a novel method to predict true survival times for the individual patient based on microarray measurements. CASPAR is based on a multivariate Cox regression model that is embedded in a Bayesian framework. A hierarchical prior distribution on the regression parameters is specifically designed to deal with high dimensionality (large number of genes) and low sample size settings, that are typical for microarray measurements. This enables CASPAR to automatically select small, most informative subsets of genes for prediction. RESULTS: Validity of the method is demonstrated on two publicly available datasets on diffuse large B-cell lymphoma (DLBCL) and on adenocarcinoma of the lung. The method successfully identifies long and short survivors, with high sensitivity and specificity. We compare our method with two alternative methods from the literature, demonstrating superior results of our approach. In addition, we show that CASPAR can further refine predictions made using clinical scoring systems such as the International Prognostic Index (IPI) for DLBCL and clinical staging for lung cancer, thus providing an additional tool for the clinician. An analysis of the genes identified confirms previously published results, and furthermore, new candidate genes correlated with survival are identified.  相似文献   

Lars Kaderali, Thomas Zander, Ulrich Faigle, Jürgen Wolf,Joachim L. Schultze  相似文献   

Generalized hierarchical multivariate CAR models for areal data   总被引:5,自引:0,他引:5  
Jin X  Carlin BP  Banerjee S 《Biometrics》2005,61(4):950-961
In the fields of medicine and public health, a common application of areal data models is the study of geographical patterns of disease. When we have several measurements recorded at each spatial location (for example, information on p>/= 2 diseases from the same population groups or regions), we need to consider multivariate areal data models in order to handle the dependence among the multivariate components as well as the spatial dependence between sites. In this article, we propose a flexible new class of generalized multivariate conditionally autoregressive (GMCAR) models for areal data, and show how it enriches the MCAR class. Our approach differs from earlier ones in that it directly specifies the joint distribution for a multivariate Markov random field (MRF) through the specification of simpler conditional and marginal models. This in turn leads to a significant reduction in the computational burden in hierarchical spatial random effect modeling, where posterior summaries are computed using Markov chain Monte Carlo (MCMC). We compare our approach with existing MCAR models in the literature via simulation, using average mean square error (AMSE) and a convenient hierarchical model selection criterion, the deviance information criterion (DIC; Spiegelhalter et al., 2002, Journal of the Royal Statistical Society, Series B64, 583-639). Finally, we offer a real-data application of our proposed GMCAR approach that models lung and esophagus cancer death rates during 1991-1998 in Minnesota counties.  相似文献   

In this paper, we provide an overview of recently developed methods for the analysis of multivariate data that do not necessarily emanate from a normal universe. Multivariate data occur naturally in the life sciences and in other research fields. When drawing inference, it is generally recommended to take the multivariate nature of the data into account, and not merely analyze each variable separately. Furthermore, it is often of major interest to select an appropriate set of important variables. We present contributions in three different, but closely related, research areas: first, a general approach to the comparison of mean vectors, which allows for profile analysis and tests of dimensionality; second, non‐parametric and parametric methods for the comparison of independent samples of multivariate observations; and third, methods for the situation where the experimental units are observed repeatedly, for example, over time, and the main focus is on analyzing different time profiles when the number p of repeated observations per subject is larger than the number n of subjects.  相似文献   

Chi YY  Ibrahim JG 《Biometrics》2006,62(2):432-445
Joint modeling of longitudinal and survival data is becoming increasingly essential in most cancer and AIDS clinical trials. We propose a likelihood approach to extend both longitudinal and survival components to be multidimensional. A multivariate mixed effects model is presented to explicitly capture two different sources of dependence among longitudinal measures over time as well as dependence between different variables. For the survival component of the joint model, we introduce a shared frailty, which is assumed to have a positive stable distribution, to induce correlation between failure times. The proposed marginal univariate survival model, which accommodates both zero and nonzero cure fractions for the time to event, is then applied to each marginal survival function. The proposed multivariate survival model has a proportional hazards structure for the population hazard, conditionally as well as marginally, when the baseline covariates are specified through a specific mechanism. In addition, the model is capable of dealing with survival functions with different cure rate structures. The methodology is specifically applied to the International Breast Cancer Study Group (IBCSG) trial to investigate the relationship between quality of life, disease-free survival, and overall survival.  相似文献   

Robbins LG 《Genetics》2000,154(1):13-26
Graduate school programs in genetics have become so full that courses in statistics have often been eliminated. In addition, typical introductory statistics courses for the "statistics user" rather than the nascent statistician are laden with methods for analysis of measured variables while genetic data are most often discrete numbers. These courses are often seen by students and genetics professors alike as largely irrelevant cookbook courses. The powerful methods of likelihood analysis, although commonly employed in human genetics, are much less often used in other areas of genetics, even though current computational tools make this approach readily accessible. This article introduces the MLIKELY.PAS computer program and the logic of do-it-yourself maximum-likelihood statistics. The program itself, course materials, and expanded discussions of some examples that are only summarized here are available at http://www.unisi. it/ricerca/dip/bio_evol/sitomlikely/mlikely.h tml.  相似文献   

In the cluster randomised study design, the data collected have a hierarchical structure and often include multivariate outcomes. We present a flexible modelling strategy that permits several normally distributed outcomes to be analysed simultaneously, in which intervention effects as well as individual-level and cluster-level between-outcome correlations are estimated. This is implemented in a Bayesian framework which has several advantages over a classical approach, for example in providing credible intervals for functions of model parameters and in allowing informative priors for the intracluster correlation coefficients. In order to declare such informative prior distributions, and fit models in which the between-outcome covariance matrices are constrained, priors on parameters within the covariance matrices are required. Careful specification is necessary however, in order to maintain non-negative definiteness and symmetry between the different outcomes. We propose a novel solution in the case of three multivariate outcomes, and present a modified existing approach and novel alternative for four or more outcomes. The methods are applied to an example of a cluster randomised trial in the prevention of coronary heart disease. The modelling strategy presented would also be useful in other situations involving hierarchical multivariate outcomes.  相似文献   

Ordination is a powerful method for analysing complex data setsbut has been largely ignored in sequence analysis. This papershows how to use principal coordinates analysis to find low–dimensionalrepresentations of distance matrices derived from aligned setsof sequences. The method takes a matrix of Euclidean distancesbetween all pairs of sequence and finds a coordinate space wherethe distances are exactly preserved The main problem is to finda measure of distance between aligned sequences that is Euclidean.The simplest distance function is the square root of the percentagedifference (as measured by identities) between two sequences,where one ignores any positions in the alignment where thereis a gap in any sequence. If one does not ignore positions witha gap, the distances cannot be guaranteed to be Euclidean butthe deleterious effects are trivial. Two examples of using themethod are shown. A set of 226 aligned globins were analysedand the resulting ordination very successfully represents theknown patterns of relationship between the sequences. In theother example, a set of 610 aligned 5S rRNA sequences were analysed.Sequence ordinations complement phylogenetic analyses. Theyshould not be viewed as a complete alternative.  相似文献   



Progressive advances in the measurement of complex multifactorial components of biological processes involving both spatial and temporal domains have made it difficult to identify the variables (genes, proteins, neurons etc.) significantly changed activities in response to a stimulus within large data sets using conventional statistical approaches. The set of all changed variables is termed hot-spots. The detection of such hot spots is considered to be an NP hard problem, but by first establishing its theoretical foundation we have been able to develop an algorithm that provides a solution.  相似文献   

Identification of phenotypic modules, semiautonomous sets of highly correlated traits, can be accomplished through exploratory (e.g., cluster analysis) or confirmatory approaches (e.g., RV coefficient analysis). Although statistically more robust, confirmatory approaches are generally unable to compare across different model structures. For example, RV coefficient analysis finds support for both two‐ and six‐module models for the therian mammalian skull. Here, we present a maximum likelihood approach that takes into account model parameterization. We compare model log‐likelihoods of trait correlation matrices using the finite‐sample corrected Akaike Information Criterion, allowing for comparison of hypotheses across different model structures. Simulations varying model complexity and within‐ and between‐module contrast demonstrate that this method correctly identifies model structure and parameters across a wide range of conditions. We further analyzed a dataset of 3‐D data, consisting of 61 landmarks from 181 macaque (Macaca fuscata) skulls, distributed among five age categories, testing 31 models, including no modularity among the landmarks and various partitions of two, three, six, and eight modules. Our results clearly support a complex six‐module model, with separate within‐ and intermodule correlations. Furthermore, this model was selected for all five age categories, demonstrating that this complex pattern of integration in the macaque skull appears early and is highly conserved throughout postnatal ontogeny. Subsampling analyses demonstrate that this method is robust to relatively low sample sizes, as is commonly encountered in rare or extinct taxa. This new approach allows for the direct comparison of models with different parameterizations, providing an important tool for the analysis of modularity across diverse systems.  相似文献   

Evolutionary biologists have adopted simple likelihood models for purposes of estimating ancestral states and evaluating character independence on specified phylogenies; however, for purposes of estimating phylogenies by using discrete morphological data, maximum parsimony remains the only option. This paper explores the possibility of using standard, well-behaved Markov models for estimating morphological phylogenies (including branch lengths) under the likelihood criterion. An important modification of standard Markov models involves making the likelihood conditional on characters being variable, because constant characters are absent in morphological data sets. Without this modification, branch lengths are often overestimated, resulting in potentially serious biases in tree topology selection. Several new avenues of research are opened by an explicitly model-based approach to phylogenetic analysis of discrete morphological data, including combined-data likelihood analyses (morphology + sequence data), likelihood ratio tests, and Bayesian analyses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号