首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Longitudinal data are common in clinical trials and observational studies, where missing outcomes due to dropouts are always encountered. Under such context with the assumption of missing at random, the weighted generalized estimating equation (WGEE) approach is widely adopted for marginal analysis. Model selection on marginal mean regression is a crucial aspect of data analysis, and identifying an appropriate correlation structure for model fitting may also be of interest and importance. However, the existing information criteria for model selection in WGEE have limitations, such as separate criteria for the selection of marginal mean and correlation structures, unsatisfactory selection performance in small‐sample setups, and so forth. In particular, there are few studies to develop joint information criteria for selection of both marginal mean and correlation structures. In this work, by embedding empirical likelihood into the WGEE framework, we propose two innovative information criteria named a joint empirical Akaike information criterion and a joint empirical Bayesian information criterion, which can simultaneously select the variables for marginal mean regression and also correlation structure. Through extensive simulation studies, these empirical‐likelihood‐based criteria exhibit robustness, flexibility, and outperformance compared to the other criteria including the weighted quasi‐likelihood under the independence model criterion, the missing longitudinal information criterion, and the joint longitudinal information criterion. In addition, we provide a theoretical justification of our proposed criteria, and present two real data examples in practice for further illustration.  相似文献   

2.
Hjort & Claeskens (2003) developed an asymptotic theoryfor model selection, model averaging and subsequent inferenceusing likelihood methods in parametric models, along with associatedconfidence statements. In this article, we consider a semiparametricversion of this problem, wherein the likelihood depends on parametersand an unknown function, and model selection/averaging is tobe applied to the parametric parts of the model. We show thatall the results of Hjort & Claeskens hold in the semiparametriccontext, if the Fisher information matrix for parametric modelsis replaced by the semiparametric information bound for semiparametricmodels, and if maximum likelihood estimators for parametricmodels are replaced by semiparametric efficient profile estimators.Our methods of proof employ Le Cam's contiguity lemmas, leadingto transparent results. The results also describe the behaviourof semiparametric model estimators when the parametric componentis misspecified, and also have implications for pointwise-consistentmodel selectors.  相似文献   

3.
4.
Liang  Hua; Wu  Hulin; Zou  Guohua 《Biometrika》2008,95(3):773-778
The conventional model selection criterion, the Akaike informationcriterion, AIC, has been applied to choose candidate modelsin mixed-effects models by the consideration of marginal likelihood.Vaida & Blanchard (2005) demonstrated that such a marginalAIC and its small sample correction are inappropriate when theresearch focus is on clusters. Correspondingly, these authorssuggested the use of conditional AIC. Their conditional AICis derived under the assumption that the variance-covariancematrix or scaled variance-covariance matrix of random effectsis known. This note provides a general conditional AIC but withoutthese strong assumptions. Simulation studies show that the proposedmethod is promising.  相似文献   

5.
Summary A large amount of information is contained within the phylogentic relationships between species. In addition to their branching patterns it is also possible to examine other aspects of the biology of the species. The influence that deleterious selection might have is determined here. The likelihood of different phylogenies in the presence of selection is explored to determine the properties of such a likelihood surface. The calculation of likelihoods for a phylogeny in the presence and absence of selection, permits the application of a likelihood ratio test to search for selection. It is shown that even a single selected site can have a strong effect on the likelihood. The method is illustrated with an example fromDrosophila melanogaster and suggests that delerious selection may be acting on transposable elements.  相似文献   

6.
Despite the increasing opportunity to collect large‐scale data sets for population genomic analyses, the use of high‐throughput sequencing to study populations of polyploids has seen little application. This is due in large part to problems associated with determining allele copy number in the genotypes of polyploid individuals (allelic dosage uncertainty–ADU), which complicates the calculation of important quantities such as allele frequencies. Here, we describe a statistical model to estimate biallelic SNP frequencies in a population of autopolyploids using high‐throughput sequencing data in the form of read counts. We bridge the gap from data collection (using restriction enzyme based techniques [e.g. GBS, RADseq]) to allele frequency estimation in a unified inferential framework using a hierarchical Bayesian model to sum over genotype uncertainty. Simulated data sets were generated under various conditions for tetraploid, hexaploid and octoploid populations to evaluate the model's performance and to help guide the collection of empirical data. We also provide an implementation of our model in the R package polyfreqs and demonstrate its use with two example analyses that investigate (i) levels of expected and observed heterozygosity and (ii) model adequacy. Our simulations show that the number of individuals sampled from a population has a greater impact on estimation error than sequencing coverage. The example analyses also show that our model and software can be used to make inferences beyond the estimation of allele frequencies for autopolyploids by providing assessments of model adequacy and estimates of heterozygosity.  相似文献   

7.
The application of different substitution models to each gene (a.k.a. mixed model) should be considered in model‐based phylogenetic analysis of multigene sequences. However, a single molecular evolution model is still usually applied. There are no computer programs able to conduct model selection for multiple loci at the same time, though several recently developed types of software for phylogenetic inference can handle mixed model. Here, I have developed computer software named ‘kakusan’ that enables us to solve the above problems. Major running steps are briefly described, and an analysis of results with kakusan is compared to that obtained with other program.  相似文献   

8.
Numerous Bayesian methods of phenotype prediction and genomic breeding value estimation based on multilocus association models have been proposed. Computationally the methods have been based either on Markov chain Monte Carlo or on faster maximum a posteriori estimation. The demand for more accurate and more efficient estimation has led to the rapid emergence of workable methods, unfortunately at the expense of well-defined principles for Bayesian model building. In this article we go back to the basics and build a Bayesian multilocus association model for quantitative and binary traits with carefully defined hierarchical parameterization of Student's t and Laplace priors. In this treatment we consider alternative model structures, using indicator variables and polygenic terms. We make the most of the conjugate analysis, enabled by the hierarchical formulation of the prior densities, by deriving the fully conditional posterior densities of the parameters and using the acquired known distributions in building fast generalized expectation-maximization estimation algorithms.  相似文献   

9.
10.
Model selection and estimation in the Gaussian graphical model   总被引:3,自引:0,他引:3  
Yuan  Ming; Lin  Yi 《Biometrika》2007,94(1):19-35
We propose penalized likelihood methods for estimating the concentrationmatrix in the Gaussian graphical model. The methods lead toa sparse and shrinkage estimator of the concentration matrixthat is positive definite, and thus conduct model selectionand estimation simultaneously. The implementation of the methodsis nontrivial because of the positive definite constraint onthe concentration matrix, but we show that the computation canbe done effectively by taking advantage of the efficient maxdetalgorithm developed in convex optimization. We propose a BIC-typecriterion for the selection of the tuning parameter in the penalizedlikelihood methods. The connection between our methods and existingmethods is illustrated. Simulations and real examples demonstratethe competitive performance of the new methods.  相似文献   

11.
Almost all studies that estimate phylogenies from DNA sequencedata under the maximum-likelihood (ML) criterion employ an approximateapproach. Most commonly, model parameters are estimated on someinitial phylogenetic estimate derived using a rapid method (neighbor-joiningor parsimony). Parameters are then held constant during a treesearch, and ideally, the procedure is repeated until convergenceis achieved. However, the effectiveness of this approximationhas not been formally assessed, in part because doing so requirescomputationally intensive, full-optimization analyses. Here,we report both indirect and direct evaluations of the effectivenessof successive approximations. We obtained an indirect evaluationby comparing the results of replicate runs on real data thatuse random trees to provide initial parameter estimates. Forsix real data sets taken from the literature, all replicateiterative searches converged to the same joint estimates oftopology and model parameters, suggesting that the approximationis not starting-point dependent, as long as the heuristic searchesof tree space are rigorous. We conducted a more direct assessmentusing simulations in which we compared the accuracy of phylogeniesestimated using full optimization of all model parameters oneach tree evaluated to the accuracy of trees estimated via successiveapproximations. There is no significant difference between theaccuracy of the approximation searches relative to full-optimizationsearches. Our results demonstrate that successive approximationis reliable and provide reassurance that this much faster approachis safe to use for ML estimation of topology.  相似文献   

12.
Successful pharmaceutical drug development requires finding correct doses. The issues that conventional dose‐response analyses consider, namely whether responses are related to doses, which doses have responses differing from a control dose response, the functional form of a dose‐response relationship, and the dose(s) to carry forward, do not need to be addressed simultaneously. Determining if a dose‐response relationship exists, regardless of its functional form, and then identifying a range of doses to study further may be a more efficient strategy. This article describes a novel estimation‐focused Bayesian approach (BMA‐Mod) for carrying out the analyses when the actual dose‐response function is unknown. Realizations from Bayesian analyses of linear, generalized linear, and nonlinear regression models that may include random effects and covariates other than dose are optimally combined to produce distributions of important secondary quantities, including test‐control differences, predictive distributions of possible outcomes from future trials, and ranges of doses corresponding to target outcomes. The objective is similar to the objective of the hypothesis‐testing based MCP‐Mod approach, but provides more model and distributional flexibility and does not require testing hypotheses or adjusting for multiple comparisons. A number of examples illustrate the application of the method.  相似文献   

13.
Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk factors adjusted for covariates. Theory of statistical models is well‐established if the set of independent variables to consider is fixed and small. Hence, we can assume that effect estimates are unbiased and the usual methods for confidence interval estimation are valid. In routine work, however, it is not known a priori which covariates should be included in a model, and often we are confronted with the number of candidate variables in the range 10–30. This number is often too large to be considered in a statistical model. We provide an overview of various available variable selection methods that are based on significance or information criteria, penalized likelihood, the change‐in‐estimate criterion, background knowledge, or combinations thereof. These methods were usually developed in the context of a linear regression model and then transferred to more generalized linear models or models for censored survival data. Variable selection, in particular if used in explanatory modeling where effect estimates are of central interest, can compromise stability of a final model, unbiasedness of regression coefficients, and validity of p‐values or confidence intervals. Therefore, we give pragmatic recommendations for the practicing statistician on application of variable selection methods in general (low‐dimensional) modeling problems and on performing stability investigations and inference. We also propose some quantities based on resampling the entire variable selection process to be routinely reported by software packages offering automated variable selection algorithms.  相似文献   

14.
15.
The concept of a group is ubiquitous in biology. It underlies classifications in evolution and ecology, including those used to describe phylogenetic levels, the habitat and functional roles of organisms in ecosystems. Surprisingly, this concept is not explicitly included in simple models for the structure of food webs, the ecological networks formed by consumer–resource interactions. We present here the simplest possible model based on groups, and show that it performs substantially better than current models at predicting the structure of large food webs. Our group-based model can be applied to different types of biological and non-biological networks, and for the first time merges in the same framework two important notions in network theory: that of compartments (sets of highly interacting nodes) and that of roles (sets of nodes that have similar interaction patterns). This model provides a basis to examine the significance of groups in biological networks and to develop more accurate models for ecological network structure. It is especially relevant at a time when a new generation of empirical data is providing increasingly large food webs.  相似文献   

16.
This paper discusses regression analysis of the failure time data arising from case-cohort periodic follow-up studies, and one feature of such data, which makes their analysis much more difficult, is that they are usually interval-censored rather than right-censored. Although some methods have been developed for general failure time data, there does not seem to exist an established procedure for the situation considered here. To address the problem, we present a semiparametric regularized procedure and develop a simple algorithm for the implementation of the proposed method. In addition, unlike some existing procedures for similar situations, the proposed procedure is shown to have the oracle property, and an extensive simulation is conducted and it suggests that the presented approach seems to work well for practical situations. The method is applied to an HIV vaccine trial that motivated this study.  相似文献   

17.
Gompertz growth curves were fitted to the data of 137 rabbits from control (C) and selected (S) lines. The animals came from a synthetic rabbit line selected for an increased growth rate. The embryos from generations 3 and 4 were frozen and thawed to be contemporary of rabbits born in generation 10. Group C was the offspring of generations 3 and 4, and group S was the contemporary offspring of generation 10. The animals were weighed individually twice a week during the first four weeks of life, and once a week thereafter, until 20 weeks of age. Subsequently, the males were weighed weekly until 40 weeks of age. The random samples of the posterior distributions of the growth curve parameters were drawn by using Markov Chain Monte Carlo (MCMC) methods. As a consequence of selection, the selected animals were heavier than the C animals throughout the entire growth curve. Adult body weight, estimated as a parameter of the Gompertz curve, was 7% higher in the selected line. The other parameters of the Gompertz curve were scarcely affected by selection. When selected and control growth curves are represented in a metabolic scale, all differences disappear.  相似文献   

18.
This paper considers inference methods for case-control logistic regression in longitudinal setups. The motivation is provided by an analysis of plains bison spatial location as a function of habitat heterogeneity. The sampling is done according to a longitudinal matched case-control design in which, at certain time points, exactly one case, the actual location of an animal, is matched to a number of controls, the alternative locations that could have been reached. We develop inference methods for the conditional logistic regression model in this setup, which can be formulated within a generalized estimating equation (GEE) framework. This permits the use of statistical techniques developed for GEE-based inference, such as robust variance estimators and model selection criteria adapted for non-independent data. The performance of the methods is investigated in a simulation study and illustrated with the bison data analysis.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号