首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 701 毫秒
1.
Summary .   Frailty models are widely used to model clustered survival data. Classical ways to fit frailty models are likelihood-based. We propose an alternative approach in which the original problem of "fitting a frailty model" is reformulated into the problem of "fitting a linear mixed model" using model transformation. We show that the transformation idea also works for multivariate proportional odds models and for multivariate additive risks models. It therefore bridges segregated methodologies as it provides a general way to fit conditional models for multivariate survival data by using mixed models methodology. To study the specific features of the proposed method we focus on frailty models. Based on a simulation study, we show that the proposed method provides a good and simple alternative for fitting frailty models for data sets with a sufficiently large number of clusters and moderate to large sample sizes within covariate-level subgroups in the clusters. The proposed method is applied to data from 27 randomized trials in advanced colorectal cancer, which are available through the Meta-Analysis Group in Cancer.  相似文献   

2.
Joint regression analysis of correlated data using Gaussian copulas   总被引:2,自引:0,他引:2  
Song PX  Li M  Yuan Y 《Biometrics》2009,65(1):60-68
Summary .  This article concerns a new joint modeling approach for correlated data analysis. Utilizing Gaussian copulas, we present a unified and flexible machinery to integrate separate one-dimensional generalized linear models (GLMs) into a joint regression analysis of continuous, discrete, and mixed correlated outcomes. This essentially leads to a multivariate analogue of the univariate GLM theory and hence an efficiency gain in the estimation of regression coefficients. The availability of joint probability models enables us to develop a full maximum likelihood inference. Numerical illustrations are focused on regression models for discrete correlated data, including multidimensional logistic regression models and a joint model for mixed normal and binary outcomes. In the simulation studies, the proposed copula-based joint model is compared to the popular generalized estimating equations, which is a moment-based estimating equation method to join univariate GLMs. Two real-world data examples are used in the illustration.  相似文献   

3.
Objectives : A twin‐based comparative study on the genetic influences in metabolic endophenotypes in two populations of substantial ethnic, environmental, and cultural differences was performed. Design and Methods : Data on 11 metabolic phenotypes including anthropometric measures, blood glucose, and lipids levels as well as blood pressure were available from 756 pairs of Danish twins (309 monozygotic and 447 dizygotic twin pairs) with a mean age of 38 years (range: 18‐67) and from 325 pairs of Chinese twins (183 monozygotic and 142 dizygotic twin pairs) with a mean age of 40.5 years (range: 18‐69). Twin modeling was performed on full and nested models with the best fitting models selected. Results : Heritability estimates were compared between Danish and Chinese samples to identify differential genetic influences on each of the phenotypes. Except for hip circumference, all other body measures exhibited similar heritability patterns in the two samples with body weight showing only a slight difference. Higher genetic influences were estimated for fasting blood glucose level in Chinese twins, whereas the Danish twins showed more genetic regulation over most lipids phenotypes. Systolic blood pressure was more genetically controlled in Danish than in Chinese twins. Conclusions : Metabolic endophenotypes show disparity in their genetic determinants in populations under distinct environmental conditions.  相似文献   

4.
In the present study, a strategy was proposed for constructing plant core subsets by clusters based on the combination of continuous data for genotypic values and discrete data for molecular marker InformaUon. A mixed linear model approach was used to predict genotyplc values for eliminating the environment effect. The "mixed genetic distance" was designed to solve the difficult problem of combining continuous and discrete data to construct a core subset by cluster. Four commonly used genetic distances for continuous data (Euclidean distance, standardized Euclidean distance, city block distance, and Mahalanobls distance) were used to assess the validity of the conUnuous data part of the mixed genetic distance; three commonly used genetic distances for discrete data (cosine distance, correlaUon distance, and Jaccard distance) were used to assess the validity of the discrete data part of the mixed genetic distance, A rice germplasm group with eight quantitative traits and information for 60 molecular markers was used to evaluate the validity of the new strategy. The results suggest that the validity of both parts of the mixed geneUc distance are equal to or higher than the common geneUc distance. The core subset constructed on the basis of a combination of data for genotyplc values and molecular marker information was more representative than that constructed on the basis of data from genotypic values or molecular marker informaUon alone. Moreover, the strategy of using combined data was able to treat dominant marker informaUon and could combine any other continuous data and discrete data together to perform cluster to construct a plant core subset.  相似文献   

5.
The augmentation of categorical outcomes with underlying Gaussian variables in bivariate generalized mixed effects models has facilitated the joint modeling of continuous and binary response variables. These models typically assume that random effects and residual effects (co)variances are homogeneous across all clusters and subjects, respectively. Motivated by conflicting evidence about the association between performance outcomes in dairy production systems, we consider the situation where these (co)variance parameters may themselves be functions of systematic and/or random effects. We present a hierarchical Bayesian extension of bivariate generalized linear models whereby functions of the (co)variance matrices are specified as linear combinations of fixed and random effects following a square‐root‐free Cholesky reparameterization that ensures necessary positive semidefinite constraints. We test the proposed model by simulation and apply it to the analysis of a dairy cattle data set in which the random herd‐level and residual cow‐level effects (co)variances between a continuous production trait and binary reproduction trait are modeled as functions of fixed management effects and random cluster effects.  相似文献   

6.
Huggins R 《Biometrics》2000,56(2):537-545
In the study of longitudinal twin and family data, interest is often in the covariance structure of the data and the decomposition of this covariance structure into genetic and environmental components rather than in estimating the mean function. Various parametric models for covariance structures have been proposed but, e.g., in studies of children where growth spurts occur at various ages, it is difficult to a priori determine an appropriate parametric model for the covariance structure. In particular, there is a general lack of the visualization procedures, such as lowess, that are invaluable in the initial stages of constructing a parametric model for a mean function. Here we use kernel smoothing to modify a cross-sectional approach based on the sample covariance matrices to obtain smoothed estimates of the genetic and environmental variances and correlations for longitudinal twin data. The methods are proposed to be exploratory as an aid to parametric modeling rather than inferential, although approximate asymptotic standard errors are derived in the Appendix.  相似文献   

7.
It is shown that maximum likelihood estimation of variance components from twin data can be parameterized in the framework of linear mixed models. Standard statistical packages can be used to analyze univariate or multivariate data for simple models such as the ACE and CE models. Furthermore, specialized variance component estimation software that can handle pedigree data and user-defined covariance structures can be used to analyze multivariate data for simple and complex models, including those where dominance and/or QTL effects are fitted. The linear mixed model framework is particularly useful for analyzing multiple traits in extended (twin) families with a large number of random effects.  相似文献   

8.
Within the pattern-mixture modeling framework for informative dropout, conditional linear models (CLMs) are a useful approach to deal with dropout that can occur at any point in continuous time (not just at observation times). However, in contrast with selection models, inferences about marginal covariate effects in CLMs are not readily available if nonidentity links are used in the mean structures. In this article, we propose a CLM for long series of longitudinal binary data with marginal covariate effects directly specified. The association between the binary responses and the dropout time is taken into account by modeling the conditional mean of the binary response as well as the dependence between the binary responses given the dropout time. Specifically, parameters in both the conditional mean and dependence models are assumed to be linear or quadratic functions of the dropout time; and the continuous dropout time distribution is left completely unspecified. Inference is fully Bayesian. We illustrate the proposed model using data from a longitudinal study of depression in HIV-infected women, where the strategy of sensitivity analysis based on the extrapolation method is also demonstrated.  相似文献   

9.
Strategies for genetic mapping of categorical traits   总被引:3,自引:0,他引:3  
Shaoqi Rao  Xia Li 《Genetica》2000,109(3):183-197
The search for efficient and powerful statistical methods and optimal mapping strategies for categorical traits under various experimental designs continues to be one of the main tasks in genetic mapping studies. Methodologies for genetic mapping of categorical traits can generally be classified into two groups, linear and non-linear models. We develop a method based on a threshold model, termed mixture threshold model to handle ordinal (or binary) data from multiple families. Monte Carlo simulations are done to compare its statistical efficiencies and properties of the proposed non-linear model with a linear model for genetic mapping of categorical traits using multiple families. The mixture threshold model has notably higher statistical power than linear models. There may be an optimal sampling strategy (family size vs number of families) in which genetic mapping reaches its maximal power and minimal estimation errors. A single large-sibship family does not necessarily produce the maximal power for detection of quantitative trait loci (QTL) due to genetic sampling of QTL alleles. The QTL allelic model has a marked impact on efficiency of genetic mapping of categorical traits in terms of statistical power and QTL parameter estimation. Compared with a fixed number of QTL alleles (two or four), the model with an infinite number of QTL alleles and normally distributed allelic effects results in loss of statistical power. The results imply that inbred designs (e.g. F2 or four-way crosses) with a few QTL alleles segregating or reducing number of QTL alleles (e.g. by selection) in outbred populations are desirable in genetic mapping of categorical traits using data from multiple families. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

10.
Path analysis in genetic epidemiology: a critique   总被引:3,自引:2,他引:1       下载免费PDF全文
Path analysis, a form of general linear structural equation models, is used in studies of human genetics data to discern genetic, environmental, and cultural factors contributing to familial resemblance. It postulates a set of linear and additive parametric relationships between phenotypes and genetic and cultural variables and then essentially uses the assumption of multivariate normality to estimate and perform tests of hypothesis on parameters. Such an approach has been advocated for the analysis of genetic epidemiological data by D. C. Rao, N. Morton, C. R. Cloninger, L. J. Eaves, and W. E. Nance, among others. This paper reviews and evaluates the formulations, assumptions, methodological procedures, interpretations, and applications of path analysis. To give perspective, we begin with a discussion of path analysis as it occurs in the form of general linear causal models in several disciplines of the social sciences. Several specific path analysis models applied to lipoprotein concentrations, IQ, and twin data are then reviewed to keep the presentation self-contained. The bulk of the critical discussion that follows is directed toward the following four facets of path analysis: (1) coherence of model specification and applicability to data; (2) plausibility of modeling assumptions; (3) interpretability and utility of the model; and (4) validity of statistical and computational procedures. In the concluding section, a brief discussion of the problem of appropriate model selection is presented, followed by a number of suggestions of essentially model-free alternative methods of use in the treatment of complex structured data such as occurs in genetic epidemiology.  相似文献   

11.
Summary .   L-splines are a large family of smoothing splines defined in terms of a linear differential operator. This article develops L-splines within the context of linear mixed models and uses the resulting mixed model L-spline to analyze longitudinal data from a grassland experiment. In the spirit of time-series analysis, a periodic mixed model L-spline is developed, which partitions data into a smooth periodic component plus smooth long-term trend.  相似文献   

12.
Summary .   Missing data, measurement error, and misclassification are three important problems in many research fields, such as epidemiological studies. It is well known that missing data and measurement error in covariates may lead to biased estimation. Misclassification may be considered as a special type of measurement error, for categorical data. Nevertheless, we treat misclassification as a different problem from measurement error because statistical models for them are different. Indeed, in the literature, methods for these three problems were generally proposed separately given that statistical modeling for them are very different. The problem is more challenging in a longitudinal study with nonignorable missing data. In this article, we consider estimation in generalized linear models under these three incomplete data models. We propose a general approach based on expected estimating equations (EEEs) to solve these three incomplete data problems in a unified fashion. This EEE approach can be easily implemented and its asymptotic covariance can be obtained by sandwich estimation. Intensive simulation studies are performed under various incomplete data settings. The proposed method is applied to a longitudinal study of oral bone density in relation to body bone density.  相似文献   

13.
Auxiliary covariate data are often collected in biomedical studies when the primary exposure variable is only assessed on a subset of the study subjects. In this study, we investigate a semiparametric‐estimated likelihood estimation for the generalized linear mixed models (GLMM) in the presence of a continuous auxiliary variable. We use a kernel smoother to handle continuous auxiliary data. The method can be used to deal with missing or mismeasured covariate data problems in a variety of applications when an auxiliary variable is available and cluster sizes are not too small. Simulation study results show that the proposed method performs better than that which ignores the random effects in GLMM and that which only uses data in the validation data set. We illustrate the proposed method with a real data set from a recent environmental epidemiology study on the maternal serum 1,1‐dichloro‐2,2‐bis(p‐chlorophenyl) ethylene level in relationship to preterm births.  相似文献   

14.
Wang X  Guo X  He M  Zhang H 《Biometrics》2011,67(3):987-995
Analysis of data from twin and family studies provides the foundation for studies of disease inheritance. The development of advanced theory and computational software for general linear models has generated considerable interest for using mixed-effect models to analyze twin and family data, as a computationally more convenient and theoretically more sound alternative to the classical structure equation modeling. Despite the long history of twin and family data analysis, some fundamental questions remain unanswered. We addressed two important issues. One is to determine the necessary and sufficient conditions for the identifiability in the mixed-effects models for twin and family data. The other is to derive the asymptotic distribution of the likelihood ratio test, which is novel due to the fact that the standard regularity conditions are not satisfied. We considered a series of specific yet important examples in which we demonstrated how to formulate mixed-effect models to appropriately reflect the data, and our key idea is the use of the Cholesky decomposition. Finally, we applied our method and theory to provide a more precise estimate of the heritability of two data sets than the previously reported estimate.  相似文献   

15.
Salway R  Wakefield J 《Biometrics》2008,64(2):620-626
Summary .   This article considers the modeling of single-dose pharmacokinetic data. Traditionally, so-called compartmental models have been used to analyze such data. Unfortunately, the mean function of such models are sums of exponentials for which inference and computation may not be straightforward. We present an alternative to these models based on generalized linear models, for which desirable statistical properties exist, with a logarithmic link and gamma distribution. The latter has a constant coefficient of variation, which is often appropriate for pharmacokinetic data. Inference is convenient from either a likelihood or a Bayesian perspective. We consider models for both single and multiple individuals, the latter via generalized linear mixed models. For single individuals, Bayesian computation may be carried out with recourse to simulation. We describe a rejection algorithm that, unlike Markov chain Monte Carlo, produces independent samples from the posterior and allows straightforward calculation of Bayes factors for model comparison. We also illustrate how prior distributions may be specified in terms of model-free pharmacokinetic parameters of interest. The methods are applied to data from 12 individuals following administration of the antiasthmatic agent theophylline.  相似文献   

16.
This paper describes an analysis of systolic blood pressure (SBP) in the Genetic Analysis Workshop 13 (GAW13) simulated data. The main aim was to assess evidence for both general and specific genetic effects on the baseline blood pressure and on the rate of change (slope) of blood pressure with time. Generalized linear mixed models were fitted using Gibbs sampling in WinBUGS, and the additive polygenic random effects estimated using these models were then used as continuous phenotypes in a variance components linkage analysis. The first-stage analysis provided evidence for general genetic effects on both the baseline and slope of blood pressure, and the linkage analysis found evidence of several genes, again for both baseline and slope.  相似文献   

17.
In order to study family‐based association in the presence of linkage, we extend a generalized linear mixed model proposed for genetic linkage analysis (Lebrec and van Houwelingen (2007), Human Heredity 64 , 5–15) by adding a genotypic effect to the mean. The corresponding score test is a weighted family‐based association tests statistic, where the weight depends on the linkage effect and on other genetic and shared environmental effects. For testing of genetic association in the presence of gene–covariate interaction, we propose a linear regression method where the family‐specific score statistic is regressed on family‐specific covariates. Both statistics are straightforward to compute. Simulation results show that adjusting the weight for the within‐family variance structure may be a powerful approach in the presence of environmental effects. The test statistic for genetic association in the presence of gene–covariate interaction improved the power for detecting association. For illustration, we analyze the rheumatoid arthritis data from GAW15. Adjusting for smoking and anti‐cyclic citrullinated peptide increased the significance of the association with the DR locus.  相似文献   

18.
Chen Q  Ibrahim JG 《Biometrics》2006,62(1):177-184
We consider a class of semiparametric models for the covariate distribution and missing data mechanism for missing covariate and/or response data for general classes of regression models including generalized linear models and generalized linear mixed models. Ignorable and nonignorable missing covariate and/or response data are considered. The proposed semiparametric model can be viewed as a sensitivity analysis for model misspecification of the missing covariate distribution and/or missing data mechanism. The semiparametric model consists of a generalized additive model (GAM) for the covariate distribution and/or missing data mechanism. Penalized regression splines are used to express the GAMs as a generalized linear mixed effects model, in which the variance of the corresponding random effects provides an intuitive index for choosing between the semiparametric and parametric model. Maximum likelihood estimates are then obtained via the EM algorithm. Simulations are given to demonstrate the methodology, and a real data set from a melanoma cancer clinical trial is analyzed using the proposed methods.  相似文献   

19.
Twin studies have been adopted for decades to disentangle the relative genetic and environmental contributions for a wide range of traits. However, heritability estimation based on the classical twin models does not take into account dynamic behavior of the variance components over age. Varying variance of the genetic component over age can imply the existence of gene–environment (G × E) interactions that general genome-wide association studies (GWAS) fail to capture, which may lead to the inconsistency of heritability estimates between twin design and GWAS. Existing parametric G × E interaction models for twin studies are limited by assuming a linear or quadratic form of the variance curves with respect to a moderator that can, however, be overly restricted in reality. Here we propose spline-based approaches to explore the variance curves of the genetic and environmental components. We choose the additive genetic, common, and unique environmental variance components (ACE) model as the starting point. We treat the component variances as variance functions with respect to age modeled by B-splines or P-splines. We develop an empirical Bayes method to estimate the variance curves together with their confidence bands and provide an R package for public use. Our simulations demonstrate that the proposed methods accurately capture dynamic behavior of the component variances in terms of mean square errors with a data set of >10,000 twin pairs. Using the proposed methods as an alternative and major extension to the classical twin models, our analyses with a large-scale Finnish twin data set (19,510 MZ twins and 27,312 DZ same-sex twins) discover that the variances of the A, C, and E components for body mass index (BMI) change substantially across life span in different patterns and the heritability of BMI drops to ∼50% after middle age. The results further indicate that the decline of heritability is due to increasing unique environmental variance, which provides more insights into age-specific heritability of BMI and evidence of G × E interactions. These findings highlight the fundamental importance and implication of the proposed models in facilitating twin studies to investigate the heritability specific to age and other modifying factors.  相似文献   

20.
Simulated data were used to determine the properties of multivariate prediction of breeding values for categorical and continuous traits using phenotypic, molecular genetic and pedigree information by mixed linear-threshold animal models via Gibbs sampling. Simulation parameters were chosen such that the data resembled situations encountered in Warmblood horse populations. Genetic evaluation was performed in the context of the radiographic findings in the equine limbs. The simulated pedigree comprised seven generations and 40 000 animals per generation. The simulated data included additive genetic values, residuals and fixed effects for one continuous trait and liabilities of four binary traits. For one of the binary traits, quantitative trait locus (QTL) effects and genetic markers were simulated, with three different scenarios with respect to recombination rate (r) between genetic markers and QTL and polymorphism information content (PIC) of genetic markers being studied: r = 0.00 and PIC = 0.90 (r0p9), r = 0.01 and PIC = 0.90 (r1p9), and r = 0.00 and PIC = 0.70 (r0p7). For each scenario, 10 replicates were sampled from the simulated horse population, and six different data sets were generated per replicate. Data sets differed in number and distribution of animals with trait records and the availability of genetic marker information. Breeding values were predicted via Gibbs sampling using a Bayesian mixed linear-threshold animal model with residual covariances fixed to zero and a proper prior for the genetic covariance matrix. Relative breeding values were used to investigate expected response to multi- and single-trait selection. In the sires with 10 or more offspring with trait information, correlations between true and predicted breeding values ranged between 0.89 and 0.94 for the continuous traits and between 0.39 and 0.77 for the binary traits. Proportions of successful identification of sires of average, favourable and unfavourable genetic value were 81% to 86% for the continuous trait and 57% to 74% for the binary traits in these sires. Expected decrease of prevalence of the QTL trait was 3% to 12% after multi-trait selection for all binary traits and 9% to 17% after single-trait selection for the QTL trait. The combined use of phenotype and genotype data was superior to the use of phenotype data alone. It was concluded that information on phenotypes and highly informative genetic markers should be used for prediction of breeding values in mixed linear-threshold animal models via Gibbs sampling to achieve maximum reduction in prevalences of binary traits.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号