首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Albert PS  Follmann DA  Wang SA  Suh EB 《Biometrics》2002,58(3):631-642
Longitudinal clinical trials often collect long sequences of binary data. Our application is a recent clinical trial in opiate addicts that examined the effect of a new treatment on repeated binary urine tests to assess opiate use over an extended follow-up. The dataset had two sources of missingness: dropout and intermittent missing observations. The primary endpoint of the study was comparing the marginal probability of a positive urine test over follow-up across treatment arms. We present a latent autoregressive model for longitudinal binary data subject to informative missingness. In this model, a Gaussian autoregressive process is shared between the binary response and missing-data processes, thereby inducing informative missingness. Our approach extends the work of others who have developed models that link the various processes through a shared random effect but do not allow for autocorrelation. We discuss parameter estimation using Monte Carlo EM and demonstrate through simulations that incorporating within-subject autocorrelation through a latent autoregressive process can be very important when longitudinal binary data is subject to informative missingness. We illustrate our new methodology using the opiate clinical trial data.  相似文献   

2.
Estimation and Prediction With HIV-Treatment Interruption Data   总被引:1,自引:0,他引:1  
We consider longitudinal clinical data for HIV patients undergoing treatment interruptions. We use a nonlinear dynamical mathematical model in attempts to fit individual patient data. A statistically-based censored data method is combined with inverse problem techniques to estimate dynamic parameters. The predictive capabilities of this approach are demonstrated by comparing simulations based on estimation of parameters using only half of the longitudinal observations to the full longitudinal data sets.  相似文献   

3.
Satten GA  Janssen R  Busch MP  Datta S 《Biometrics》1999,55(4):1224-1227
Disease incidence (new cases of disease per person per year) is usually measured by using longitudinal data. However, several recent proposals for measuring the incidence of human immunodeficiency virus (HIV) rely on cross-sectional data only. These methods assume each person is only sampled once; however, in some instances, it is necessary to consider these cross-sectional methods when individuals are represented more than once in the survey sample. We derive an extension of the cross-sectional incidence estimator that is valid for data from repeatedly screened populations and show under what conditions our new estimator reduces to the old estimator. An example involving estimation of HIV incidence among repeat blood donors is presented.  相似文献   

4.
Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package.  相似文献   

5.

Background

HIV surveillance of generalised epidemics in Africa primarily relies on prevalence at antenatal clinics, but estimates of incidence in the general population would be more useful. Repeated cross-sectional measures of HIV prevalence are now becoming available for general populations in many countries, and we aim to develop and validate methods that use these data to estimate HIV incidence.

Methods and Findings

Two methods were developed that decompose observed changes in prevalence between two serosurveys into the contributions of new infections and mortality. Method 1 uses cohort mortality rates, and method 2 uses information on survival after infection. The performance of these two methods was assessed using simulated data from a mathematical model and actual data from three community-based cohort studies in Africa. Comparison with simulated data indicated that these methods can accurately estimates incidence rates and changes in incidence in a variety of epidemic conditions. Method 1 is simple to implement but relies on locally appropriate mortality data, whilst method 2 can make use of the same survival distribution in a wide range of scenarios. The estimates from both methods are within the 95% confidence intervals of almost all actual measurements of HIV incidence in adults and young people, and the patterns of incidence over age are correctly captured.

Conclusions

It is possible to estimate incidence from cross-sectional prevalence data with sufficient accuracy to monitor the HIV epidemic. Although these methods will theoretically work in any context, we have able to test them only in southern and eastern Africa, where HIV epidemics are mature and generalised. The choice of method will depend on the local availability of HIV mortality data.  相似文献   

6.
Using the simulated data set from Genetic Analysis Workshop 13, we explored the advantages of using longitudinal data in genetic analyses. The weighted average of the longitudinal data for each of seven quantitative phenotypes were computed and analyzed. Genome screen results were then compared for these longitudinal phenotypes and the results obtained using two cross-sectional designs: data collected near a single age (45 years) and data collected at a single time point. Significant linkage was obtained for nine regions (LOD scores ranging from 5.5 to 34.6) for six of the phenotypes. Using cross-sectional data, LOD scores were slightly lower for the same chromosomal regions, with two regions becoming nonsignificant and one additional region being identified. The magnitude of the LOD score was highly correlated with the heritability of each phenotype as well as the proportion of phenotypic variance due to that locus. There were no false-positive linkage results using the longitudinal data and three false-positive findings using the cross-sectional data. The three false positive results appear to be due to the kurtosis in the trait distribution, even after removing extreme outliers. Our analyses demonstrated that the use of simple longitudinal phenotypes was a powerful means to detect genes of major to moderate effect on trait variability. In only one instance was the power and heritability of the trait increased by using data from one examination. Power to detect linkage can be improved by identifying the most heritable phenotype, ensuring normality of the trait distribution and maximizing the information utilized through novel longitudinal designs for genetic analysis.  相似文献   

7.
The rates of escape and reversion in response to selection pressure arising from the host immune system, notably the cytotoxic T-lymphocyte (CTL) response, are key factors determining the evolution of HIV. Existing methods for estimating these parameters from cross-sectional population data using ordinary differential equations (ODEs) ignore information about the genealogy of sampled HIV sequences, which has the potential to cause systematic bias and overestimate certainty. Here, we describe an integrated approach, validated through extensive simulations, which combines genealogical inference and epidemiological modelling, to estimate rates of CTL escape and reversion in HIV epitopes. We show that there is substantial uncertainty about rates of viral escape and reversion from cross-sectional data, which arises from the inherent stochasticity in the evolutionary process. By application to empirical data, we find that point estimates of rates from a previously published ODE model and the integrated approach presented here are often similar, but can also differ several-fold depending on the structure of the genealogy. The model-based approach we apply provides a framework for the statistical analysis and hypothesis testing of escape and reversion in population data and highlights the need for longitudinal and denser cross-sectional sampling to enable accurate estimate of these key parameters.  相似文献   

8.
Macgregor S  Knott SA  White I  Visscher PM 《Genetics》2005,171(3):1365-1376
There is currently considerable interest in genetic analysis of quantitative traits such as blood pressure and body mass index. Despite the fact that these traits change throughout life they are commonly analyzed only at a single time point. The genetic basis of such traits can be better understood by collecting and effectively analyzing longitudinal data. Analyses of these data are complicated by the need to incorporate information from complex pedigree structures and genetic markers. We propose conducting longitudinal quantitative trait locus (QTL) analyses on such data sets by using a flexible random regression estimation technique. The relationship between genetic effects at different ages is efficiently modeled using covariance functions (CFs). Using simulated data we show that the change in genetic effects over time can be well characterized using CFs and that including parameters to model the change in effect with age can provide substantial increases in power to detect QTL compared with repeated measure or univariate techniques. The asymptotic distributions of the methods used are investigated and methods for overcoming the practical difficulties in fitting CFs are discussed. The CF-based techniques should allow efficient multivariate analyses of many data sets in human and natural population genetics.  相似文献   

9.
For Genetic Analysis Workshop 19, 2 extensive data sets were provided, including whole genome and whole exome sequence data, gene expression data, and longitudinal blood pressure outcomes, together with nongenetic covariates. These data sets gave researchers the chance to investigate different aspects of more complex relationships within the data, and the contributions in our working group focused on statistical methods for the joint analysis of multiple phenotypes, which is part of the research field of data integration. The analysis of data from different sources poses challenges to researchers but provides the opportunity to model the real-life situation more realistically.Our 4 contributions all used the provided real data to identify genetic predictors for blood pressure. In the contributions, novel multivariate rare variant tests, copula models, structural equation models and a sparse matrix representation variable selection approach were applied. Each of these statistical models can be used to investigate specific hypothesized relationships, which are described together with their biological assumptions.The results showed that all methods are ready for application on a genome-wide scale and can be used or extended to include multiple omics data sets. The results provide potentially interesting genetic targets for future investigation and replication. Furthermore, all contributions demonstrated that the analysis of complex data sets could benefit from modeling correlated phenotypes jointly as well as by adding further bioinformatics information.  相似文献   

10.
Sangbum Choi  Xuelin Huang 《Biometrics》2012,68(4):1126-1135
Summary We propose a semiparametrically efficient estimation of a broad class of transformation regression models for nonproportional hazards data. Classical transformation models are to be viewed from a frailty model paradigm, and the proposed method provides a unified approach that is valid for both continuous and discrete frailty models. The proposed models are shown to be flexible enough to model long‐term follow‐up survival data when the treatment effect diminishes over time, a case for which the PH or proportional odds assumption is violated, or a situation in which a substantial proportion of patients remains cured after treatment. Estimation of the link parameter in frailty distribution, considered to be unknown and possibly dependent on a time‐independent covariates, is automatically included in the proposed methods. The observed information matrix is computed to evaluate the variances of all the parameter estimates. Our likelihood‐based approach provides a natural way to construct simple statistics for testing the PH and proportional odds assumptions for usual survival data or testing the short‐ and long‐term effects for survival data with a cure fraction. Simulation studies demonstrate that the proposed inference procedures perform well in realistic settings. Applications to two medical studies are provided.  相似文献   

11.
We focus on the problem of generalizing a causal effect estimated on a randomized controlled trial (RCT) to a target population described by a set of covariates from observational data. Available methods such as inverse propensity sampling weighting are not designed to handle missing values, which are however common in both data sources. In addition to coupling the assumptions for causal effect identifiability and for the missing values mechanism and to defining appropriate estimation strategies, one difficulty is to consider the specific structure of the data with two sources and treatment and outcome only available in the RCT. We propose three multiple imputation strategies to handle missing values when generalizing treatment effects, each handling the multisource structure of the problem differently (separate imputation, joint imputation with fixed effect, joint imputation ignoring source information). As an alternative to multiple imputation, we also propose a direct estimation approach that treats incomplete covariates as semidiscrete variables. The multiple imputation strategies and the latter alternative rely on different sets of assumptions concerning the impact of missing values on identifiability. We discuss these assumptions and assess the methods through an extensive simulation study. This work is motivated by the analysis of a large registry of over 20,000 major trauma patients and an RCT studying the effect of tranexamic acid administration on mortality in major trauma patients admitted to intensive care units. The analysis illustrates how the missing values handling can impact the conclusion about the effect generalized from the RCT to the target population.  相似文献   

12.
Zhang D  Lin X  Sowers M 《Biometrics》2000,56(1):31-39
We consider semiparametric regression for periodic longitudinal data. Parametric fixed effects are used to model the covariate effects and a periodic nonparametric smooth function is used to model the time effect. The within-subject correlation is modeled using subject-specific random effects and a random stochastic process with a periodic variance function. We use maximum penalized likelihood to estimate the regression coefficients and the periodic nonparametric time function, whose estimator is shown to be a periodic cubic smoothing spline. We use restricted maximum likelihood to simultaneously estimate the smoothing parameter and the variance components. We show that all model parameters can be easily obtained by fitting a linear mixed model. A common problem in the analysis of longitudinal data is to compare the time profiles of two groups, e.g., between treatment and placebo. We develop a scaled chi-squared test for the equality of two nonparametric time functions. The proposed model and the test are illustrated by analyzing hormone data collected during two consecutive menstrual cycles and their performance is evaluated through simulations.  相似文献   

13.
In this paper, we develop a machine learning system for determining gene functions from heterogeneous data sources using a Weighted Naive Bayesian network (WNB). The knowledge of gene functions is crucial for understanding many fundamental biological mechanisms such as regulatory pathways, cell cycles and diseases. Our major goal is to accurately infer functions of putative genes or Open Reading Frames (ORFs) from existing databases using computational methods. However, this task is intrinsically difficult since the underlying biological processes represent complex interactions of multiple entities. Therefore, many functional links would be missing when only one or two sources of data are used in the prediction. Our hypothesis is that integrating evidence from multiple and complementary sources could significantly improve the prediction accuracy. In this paper, our experimental results not only suggest that the above hypothesis is valid, but also provide guidelines for using the WNB system for data collection, training and predictions. The combined training data sets contain information from gene annotations, gene expressions, clustering outputs, keyword annotations, and sequence homology from public databases. The current system is trained and tested on the genes of budding yeast Saccharomyces cerevisiae. Our WNB model can also be used to analyze the contribution of each source of information toward the prediction performance through the weight training process. The contribution analysis could potentially lead to significant scientific discovery by facilitating the interpretation and understanding of the complex relationships between biological entities.  相似文献   

14.
15.
Accurate prediction of tumor progression is key for adaptive therapy and precision medicine. Cancer progression models (CPMs) can be used to infer dependencies in mutation accumulation from cross-sectional data and provide predictions of tumor progression paths. However, their performance when predicting complete evolutionary trajectories is limited by violations of assumptions and the size of available data sets. Instead of predicting full tumor progression paths, here we focus on short-term predictions, more relevant for diagnostic and therapeutic purposes. We examine whether five distinct CPMs can be used to answer the question “Given that a genotype with n mutations has been observed, what genotype with n + 1 mutations is next in the path of tumor progression?” or, shortly, “What genotype comes next?”. Using simulated data we find that under specific combinations of genotype and fitness landscape characteristics CPMs can provide predictions of short-term evolution that closely match the true probabilities, and that some genotype characteristics can be much more relevant than global features. Application of these methods to 25 cancer data sets shows that their use is hampered by a lack of information needed to make principled decisions about method choice. Fruitful use of these methods for short-term predictions requires adapting method’s use to local genotype characteristics and obtaining reliable indicators of performance; it will also be necessary to clarify the interpretation of the method’s results when key assumptions do not hold.  相似文献   

16.
On gene ranking using replicated microarray time course data   总被引:1,自引:0,他引:1  
Tai YC  Speed TP 《Biometrics》2009,65(1):40-51
Summary .  Consider the ranking of genes using data from replicated microarray time course experiments, where there are multiple biological conditions, and the genes of interest are those whose temporal profiles differ across conditions. We derive a multisample multivariate empirical Bayes' statistic for ranking genes in the order of differential expression, from both longitudinal and cross-sectional replicated developmental microarray time course data. Our longitudinal multisample model assumes that time course replicates are independent and identically distributed multivariate normal vectors. On the other hand, we construct a cross-sectional model using a normal regression framework with any appropriate basis for the design matrices. In both cases, we use natural conjugate priors in our empirical Bayes' setting which guarantee closed form solutions for the posterior odds. The simulations and two case studies using published worm and mouse microarray time course datasets indicate that the proposed approaches perform satisfactorily.  相似文献   

17.
Most existing genome-wide association analyses are cross-sectional, utilizing only phenotypic data at a single time point, e.g. baseline. On the other hand, longitudinal studies, such as Alzheimer''s Disease Neuroimaging Initiative (ADNI), collect phenotypic information at multiple time points. In this article, as a case study, we conducted both longitudinal and cross-sectional analyses of the ADNI data with several brain imaging (not clinical diagnosis) phenotypes, demonstrating the power gains of longitudinal analysis over cross-sectional analysis. Specifically, we scanned genome-wide single nucleotide polymorphisms (SNPs) with 56 brain-wide imaging phenotypes processed by FreeSurfer on 638 subjects. At the genome-wide significance level () or a less stringent level (e.g. ), longitudinal analysis of the phenotypic data from the baseline to month 48 identified more SNP-phenotype associations than cross-sectional analysis of only the baseline data. In particular, at the genome-wide significance level, both SNP rs429358 in gene APOE and SNP rs2075650 in gene TOMM40 were confirmed to be associated with various imaging phenotypes in multiple regions of interests (ROIs) by both analyses, though longitudinal analysis detected more regional phenotypes associated with the two SNPs and indicated another significant SNP rs439401 in gene APOE. In light of the power advantage of longitudinal analysis, we advocate its use in current and future longitudinal neuroimaging studies.  相似文献   

18.
Cross-sectional HIV incidence estimation based on a sensitive and less-sensitive test offers great advantages over the traditional cohort study. However, its use has been limited due to concerns about the false negative rate of the less-sensitive test, reflecting the phenomenon that some subjects may remain negative permanently on the less-sensitive test. Wang and Lagakos (2010, Biometrics 66, 864-874) propose an augmented cross-sectional design that provides one way to estimate the size of the infected population who remain negative permanently and subsequently incorporate this information in the cross-sectional incidence estimator. In an augmented cross-sectional study, subjects who test negative on the less-sensitive test in the cross-sectional survey are followed forward for transition into the nonrecent state, at which time they would test positive on the less-sensitive test. However, considerable uncertainty exists regarding the appropriate length of follow-up and the size of the infected population who remain nonreactive permanently to the less-sensitive test. In this article, we assess the impact of varying follow-up time on the resulting incidence estimators from an augmented cross-sectional study, evaluate the robustness of cross-sectional estimators to assumptions about the existence and the size of the subpopulation who will remain negative permanently, and propose a new estimator based on abbreviated follow-up time (AF). Compared to the original estimator from an augmented cross-sectional study, the AF estimator allows shorter follow-up time and does not require estimation of the mean window period, defined as the average time between detectability of HIV infection with the sensitive and less-sensitive tests. It is shown to perform well in a wide range of settings. We discuss when the AF estimator would be expected to perform well and offer design considerations for an augmented cross-sectional study with abbreviated follow-up.  相似文献   

19.
In this paper we review the methodological underpinnings of the general pharmacogenetic approach for uncovering genetically-driven treatment effect heterogeneity. This typically utilises only individuals who are treated and relies on fairly strong baseline assumptions to estimate what we term the ‘genetically moderated treatment effect’ (GMTE). When these assumptions are seriously violated, we show that a robust but less efficient estimate of the GMTE that incorporates information on the population of untreated individuals can instead be used. In cases of partial violation, we clarify when Mendelian randomization and a modified confounder adjustment method can also yield consistent estimates for the GMTE. A decision framework is then described to decide when a particular estimation strategy is most appropriate and how specific estimators can be combined to further improve efficiency. Triangulation of evidence from different data sources, each with their inherent biases and limitations, is becoming a well established principle for strengthening causal analysis. We call our framework ‘Triangulation WIthin a STudy’ (TWIST)’ in order to emphasise that an analysis in this spirit is also possible within a single data set, using causal estimates that are approximately uncorrelated, but reliant on different sets of assumptions. We illustrate these approaches by re-analysing primary-care-linked UK Biobank data relating to CYP2C19 genetic variants, Clopidogrel use and stroke risk, and data relating to APOE genetic variants, statin use and Coronary Artery Disease.  相似文献   

20.
In biomedical research, hierarchical models are very widely used to accommodate dependence in multivariate and longitudinal data and for borrowing of information across data from different sources. A primary concern in hierarchical modeling is sensitivity to parametric assumptions, such as linearity and normality of the random effects. Parametric assumptions on latent variable distributions can be challenging to check and are typically unwarranted, given available prior knowledge. This article reviews some recent developments in Bayesian nonparametric methods motivated by complex, multivariate and functional data collected in biomedical studies. The author provides a brief review of flexible parametric approaches relying on finite mixtures and latent class modeling. Dirichlet process mixture models are motivated by the need to generalize these approaches to avoid assuming a fixed finite number of classes. Focusing on an epidemiology application, the author illustrates the practical utility and potential of nonparametric Bayes methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号